爬虫: 图片

  1.  使用 run.py 运行不成功, 使用命令行可以   scrapy crawl jiandan
  2. Pillow 是用来生成缩略图,并将图片归一化为JPEG/RGB格式,因此为了使用图片管道,你需要安装这个库。 Python Imaging               Library (PIL) 在大多数情况下是有效的,但众所周知,在一些设置里会出现问题,因此我们推荐使用 Pillow 而不是PIL.

     

           咱们这次用到的就是Images Pipeline,用来下载图片,同时使用 Pillow 生成缩略图。在安装Scrapy的基础上,使用pip install pillow 安装这个模块。

  3. Traceback (most recent call last):
    File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/pipelines/files.py”, line 356, in media_downloaded
    checksum = self.file_downloaded(response, request, info)
    File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/pipelines/images.py”, line 98, in file_downloaded
    return self.image_downloaded(response, request, info)
    File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/pipelines/images.py”, line 102, in image_downloaded
    for path, image, buf in self.get_images(response, request, info):
    File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/pipelines/images.py”, line 115, in get_images
    orig_image = Image.open(BytesIO(response.body))
    File “/Users/saiwei/Library/Python/3.5/lib/python/site-packages/PIL/Image.py”, line 2319, in open
    % (filename if filename else fp))
    OSError: cannot identify image file <_io.BytesIO object at 0x1189982b0>

参考:

[1]https://scrapy-chs.readthedocs.io/zh_CN/1.0/topics/media-pipeline.html  [官方例子]

[2]http://www.cnblogs.com/qiyeboy/p/5449266.html   [详细例子]

[3]

发表评论

电子邮件地址不会被公开。 必填项已用*标注