scrapy 爬取糗事百科段子篇章二（下载用户头像）

接着博客往下走上篇博客地址

一、更新代码

vim ITtest.py

import scrapy
from qiushi.items import QiushiItem   #导入糗事项目下items中QiushiItem函数
from scrapy.http.response.html import HtmlResponse   #导入HtmlXPathSelector模块
from scrapy.selector.unified   import SelectorList   #导入SelectorList模块
import urllib
import osclass IttestSpider(scrapy.Spider):name = 'ITtest'allowed_domains = ['www.qiushibaike.com']start_urls = ['https://www.qiushibaike.com/text/page/1/']bash_domain = "https://www.qiushibaike.com"def parse(self, response):body = response.xpath('//div[@class="col1 old-style-col1"]/div')for duanzhi in body:touxiang = duanzhi.xpath('.//div//@src').get()neirong = duanzhi.xpath('.//div[@class="content"]//text()').getall()neirong = "".join(neirong).strip()zuozhe  = duanzhi.xpath('.//div//h2/text()').get().strip()item = QiushiItem(头像=touxiang,作者=zuozhe,内容=neirong)#判断文件夹是否存在，无则创建path_dir = os.path.dirname(os.getcwd()) + '/img/'if not os.path.exists(path_dir):os.mkdir(path_dir)if  zuozhe and touxiang:print(zuozhe,touxiang)file_path = os.path.join(path_dir, zuozhe + '.jpg')if not os.path.exists(file_path):#os.mknod创建空文件os.mknod(file_path)print(file_path)# #urllib.urlretrieve 直接将远程数据下载到本地urllib.request.urlretrieve('http:'+touxiang, file_path)yield itemnext_url = response.xpath("//ul[@class='pagination']/li[last()]/a/@href").get()if not next_url:returnelse:yield  scrapy.Request(self.bash_domain+next_url,callback=self.parse)

二、再次爬虫

scrapy  crawl ITtest

三、查看爬取数据

四、打包压缩传输到windows机器中

zip -r img.zip img/

查看img文件

scrapy 爬取糗事百科段子篇章二（下载用户头像）相关推荐

【Python爬虫系列教程 28-100】小姐姐带你入门爬虫框架Scrapy、使用Scrapy框架爬取糗事百科段子
文章目录 Scrapy快速入门安装和文档: 快速入门: 创建项目: 目录结构介绍: Scrapy框架架构 Scrapy框架介绍: Scrapy框架模块功能: Scrapy Shell 打开Scrap ...
转 Python爬虫实战一之爬取糗事百科段子
静觅 » Python爬虫实战一之爬取糗事百科段子首先,糗事百科大家都听说过吧?糗友们发的搞笑的段子一抓一大把,这次我们尝试一下用爬虫把他们抓取下来. 友情提示糗事百科在前一段时间进行了改版,导致 ...
Python爬虫实战之爬取糗事百科段子
Python爬虫实战之爬取糗事百科段子完整代码地址:Python爬虫实战之爬取糗事百科段子程序代码详解: Spider1-qiushibaike.py:爬取糗事百科的8小时最新页的段子.包含的信息 ...
爬虫实战1：爬取糗事百科段子
本文主要展示利用python3.7+urllib实现一个简单无需登录爬取糗事百科段子实例. 如何获取网页源代码对网页源码进行正则分析,爬取段子对爬取数据进行再次替换&删除处理易于阅读 0. ...
Python爬虫实战一之爬取糗事百科段子
点我进入原文另外, 中间遇到两个问题: 1. ascii codec can't decode byte 0xe8 in position 0:ordinal not in range(128) 解 ...
爬取糗事百科段子（xpath）
爬取糗事百科段子(xpath) import requests from lxml import etreeheaders = {'user-agent': 'Mozilla/5.0 (Windows ...
Python爬取糗事百科段子+定时发送QQ邮箱
文章目录前言 1. 库导入及介绍 2. 获取网页源码 3. 提取需要的信息 4. 优化输出数据 5. 发送邮件 6. 实现定时发送 7. 源码前言学习Python爬虫也有段时间了,总想着搞点事做 ...
Python爬取糗事百科段子
Python爬取糗事百科段子 Python2.7.15 今天我们来爬取糗事百科的段子一.获取糗事百科的网页源码首先,打开浏览器,进入糗事百科,复制它的网址. 然后我们翻个页,可以看到,网址变成了这 ...
Python3写爬虫（五）爬取糗事百科段子
2019独角兽企业重金招聘Python工程师标准>>> 最近几天开始用Python3改写网上用Python2写的案例,发现完全可以用Python3来重构Python2的源码.本篇文章 ...

scrapy 爬取糗事百科段子篇章二（下载用户头像）

scrapy 爬取糗事百科段子篇章二（下载用户头像）相关推荐

最新文章

热门文章