Python—Scrapy爬取京东商城

1.创建项目
scrapy startproject jd

效果：

2.生成一个爬虫
scrapy genspider jd_category jd.com

效果：

3.在items.py文件中定义要提取的字段

import scrapyclass JdItem(scrapy.Item):"""商品信息"""title = scrapy.Field()  # 标题price = scrapy.Field()  # 价格sku_id = scrapy.Field()  # 商品idurl = scrapy.Field()  # 商品链接info = scrapy.Field()  # 评论class CommentItem(scrapy.Item):"""评论"""# '留言时间', '评分', '回复数', '点赞数', '图片数', '评论内容'content = scrapy.Field()comment_time = scrapy.Field()reply_count = scrapy.Field()score = scrapy.Field()vote_count = scrapy.Field()image_count = scrapy.Field()

4.jd_category.py中的内容，对列表页进行了爬取

对评论信息进行了爬取

import html
import json
import reimport scrapy
from ..items import JdItem, CommentItemclass JdSpider(scrapy.Spider):name = 'jd_goods'allowed_domains = ['jd.com']  # 有的时候写个www.jd.com会导致search.jd.com无法爬取# https: // list.jd.com / list.html?cat = 9987, 653, 655page = 1s = 1url = 'https://list.jd.com/list.html?cat=9987%2C653%2C655&page=1&s=1&click=0'next_url = 'https://list.jd.com/list.html?cat=9987%2C653%2C655&page={}&s={}&click=0'def start_requests(self):yield scrapy.Request(self.url)def parse(self, response):"""爬取每页的前三十个商品，数据直接展示在原网页中:param response::return:"""for li in response.xpath('//*[@id="J_goodsList"]/ul/li'):item = JdItem()title = li.xpath('div/div/a/em/text()').extract_first("")  # 标题price = li.xpath('div/div/strong/i/text()').extract_first("")  # 价格sku_id = li.xpath('./@data-sku').extract_first("")  # id# 详细内容的urlurl = li.xpath('./div/div[@class="p-img"]/a/@href').extract_first("")  # 需要跟进的链接item['title'] = titleitem['price'] = priceitem['url'] = urlitem['sku_id'] = sku_idif not item['url'].startswith("https:"):item['info'] = Noneitem['url'] = "https:" + item['url']# yield item# 详细页面yield scrapy.Request(item['url'], callback=self.info_parse, meta={"item": item})if self.page <=10:self.page +=2self.s +=60# print(self.next_url.format(self.page, self.s))yield scrapy.Request(url=self.next_url.format(self.page, self.s),callback=self.parse)def info_parse(self, response):"""详细页面:param response::return:"""item = response.meta['item']# 评论页面的url# page是评论的页面，如果爬取多页，可以更改pagecomment_url = 'https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98&productId={}' \'&score=0&sortType=5&page=0&pageSize=10&isShadowSku=0&fold=1'# print(comment_url.format(item.get('sku_id')))# 评论页面yield scrapy.Request(comment_url.format(item.get('sku_id')), callback=self.comment_parse, meta={"item": item})def comment_parse(self, response):"""爬取评论:param response::return:"""text= response.textcomment_list = re.findall(r'guid":".*?"content":"(.*?)".*?"creationTime":"(.*?)",".*?"replyCount":(\d+),"score":(\d+).*?usefulVoteCount":(\d+).*?imageCount":(\d+).*?images":',text)info = []for result in comment_list:# 根据正则表达式结果匹配数据# '留言时间', '评分', '回复数', '点赞数', '图片数', '评论内容'comment_item = CommentItem()comment_item['content'] = result[0]comment_item['comment_time'] = result[1]comment_item['reply_count'] = result[2]comment_item['score'] = result[3]comment_item['vote_count'] = result[4]comment_item['image_count'] = result[5]info.append(comment_item)item = response.meta['item']item['info'] = infoyield item

5.只在pipelines.py中进行了简单的打印

6.执行： python -m scrapy crawl jd_category
效果：

Python—Scrapy爬取京东商城相关推荐

Python scrapy爬取京东，百度百科出现乱码，解决方案
Python scrapy爬取京东百度百科出现乱码解决方案十分想念顺店杂可... 抓取百度百科,出现乱码把页面源码下载下来之后,发现全是乱码,浏览器打开但是浏览器链接打开就没有乱码以下是浏 ...
利用python爬虫爬取京东商城商品图片
笔者曾经用python第三方库requests来爬取京东商城的商品页内容,经过解析之后发现只爬到了商品页一半的图片.(这篇文章我们以爬取智能手机图片为例) 当鼠标没有向下滑时,此时查看源代码的话,就会 ...
Scrapy爬取京东商城华为全系列手机评论
本文转自:https://mp.weixin.qq.com/s?__biz=MzA4MTk3ODI2OA==&mid=2650342004&idx=1&sn=4d270ab7c ...
用scrapy爬取京东商城的商品信息
软件环境: 1 gevent (1.2.2) 2 greenlet (0.4.12) 3 lxml (4.1.1) 4 pymongo (3.6.0) 5 pyOpenSSL (17.5.0) 6 r ...
Scrapy 爬取京东商城华为全系列手机评论
向AI转型的程序员都关注了这个号
Scrapy练习——爬取京东商城商品信息
刚刚接触爬虫,花了一段时间研究了一下如何使用scrapy,写了一个比较简单的小程序,主要用于爬取京东商城有关进口牛奶页面的商品信息,包括商品的名称,价格,店铺名称,链接,以及评价的一些信息等.简单记录 ...
python爬京东联盟_python爬虫框架scrapy实战之爬取京东商城进阶篇
前言之前的一篇文章已经讲过怎样获取链接,怎样获得参数了,详情请看python爬取京东商城普通篇,本文将详细介绍利用python爬虫框架scrapy如何爬取京东商城,下面话不多说了,来看看详细的介绍吧 ...
python爬虫完整实例-python爬虫实战之爬取京东商城实例教程
前言本文主要介绍的是利用python爬取京东商城的方法,文中介绍的非常详细,下面话不多说了,来看看详细的介绍吧. 主要工具 scrapy BeautifulSoup requests 分析步骤 1. ...
爬虫python的爬取步骤-python爬虫实战之爬取京东商城实例教程
前言本文主要介绍的是利用python爬取京东商城的方法,文中介绍的非常详细,下面话不多说了,来看看详细的介绍吧. 主要工具 scrapy BeautifulSoup requests 分析步骤 1. ...
python爬虫爬图片教程_python爬虫实战之爬取京东商城实例教程
前言本文主要介绍的是利用python爬取京东商城的方法,文中介绍的非常详细,下面话不多说了,来看看详细的介绍吧. 主要工具 scrapy BeautifulSoup requests 分析步骤 1. ...

Python—Scrapy爬取京东商城

Python—Scrapy爬取京东商城

Python—Scrapy爬取京东商城相关推荐

最新文章

热门文章