使用requests爬取易物天下商品类型实战.md

##使用requests爬取易物天下商品类型实战

确定要爬取的数据
1. 爬取的是首页的行业分类
2. 确定数据来源
  - 先使用requests.get方法获取网页并没有行业分类
```
  response = requests.get(url, params = qs, headers = headers)
```
  - 有可能数据是通过发送ajax获取来的
    
    浏览器打开网址,右键检查,选择network,发现果然是通过ajax发送来请求数据

开始爬取数据

因为数据是通过ajax请求的,所以我直接把浏览器上所有的Request.headers中的所有字段拷贝下来,变成一个字典

  headers={"Accept": "application/json, text/javascript, */*; q=0.01",'Accept-Encoding': "gzip, deflate",'Accept-Language': 'zh-CN,zh;q=0.9','Connection': 'keep-alive','Content-Length': '4','Content-Type': 'application/x-www-form-urlencoded','Cookie': 'JSESSIONID=C7BD7DFF7031A1A7EE3B71336BE03419; gr_user_id=47edb7df-b13e-4c2c-8c0c-0db4c28f09ff; Hm_lvt_10bdb52fd1832ac4eeceeabdc4df132f=1537604218; Hm_lpvt_10bdb52fd1832ac4eeceeabdc4df132f=1537608109; gr_session_id_a08ca0a390ddd043=9646c1ab-1d0c-41be-819e-51a04b592b26; gr_session_id_a08ca0a390ddd043_9646c1ab-1d0c-41be-819e-51a04b592b26=true','Host': 'www.i1515.com','Origin': 'http://www.i1515.com','Referer': 'http://www.i1515.com/','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36','X-Requested-With': 'XMLHttpRequest'}

查看是否Form Data中是否有字段,如果有,转化成字典
```
  data={"id":"1"}
```

最后我发现网站一共发送12次ajax请求,并且每一次的id不同,所以我只需要通过循环来发送请求,将数据暂时存储在json文件中

  for i in range(1,12):data["id"]=str(i)try:response = requests.post(url=url, headers=headers, data=data)print(i)print(type(response.json()))result=response.json()print(type(response.json())=="dict")if type(response.json())==type({}):print(response.json())with open('type{}.json'.format(i),'w',encoding='utf-8') as f:json.dump(result,f,ensure_ascii=False)f.close()except Exception as ex:print(ex)

将json文件中的数据存储到数据库中

循环遍历每个文件

  with open('myspiders/type{}.json'.format(index), 'r', encoding='utf-8') as f:

打开数据库

  conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='',db='orsp', charset='utf8')

最后插入数据

##源码

爬虫test.py

  import requestsimport jsonurl='http://www.i1515.com/v2/category/getOtherCategory.html'headers={"Accept": "application/json, text/javascript, */*; q=0.01",'Accept-Encoding': "gzip, deflate",'Accept-Language': 'zh-CN,zh;q=0.9','Connection': 'keep-alive','Content-Length': '4','Content-Type': 'application/x-www-form-urlencoded','Cookie': 'JSESSIONID=C7BD7DFF7031A1A7EE3B71336BE03419; gr_user_id=47edb7df-b13e-4c2c-8c0c-0db4c28f09ff; Hm_lvt_10bdb52fd1832ac4eeceeabdc4df132f=1537604218; Hm_lpvt_10bdb52fd1832ac4eeceeabdc4df132f=1537608109; gr_session_id_a08ca0a390ddd043=9646c1ab-1d0c-41be-819e-51a04b592b26; gr_session_id_a08ca0a390ddd043_9646c1ab-1d0c-41be-819e-51a04b592b26=true','Host': 'www.i1515.com','Origin': 'http://www.i1515.com','Referer': 'http://www.i1515.com/','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36','X-Requested-With': 'XMLHttpRequest'}data={"id":"1"}for i in range(1,12):data["id"]=str(i)try:response = requests.post(url=url, headers=headers, data=data)print(i)print(type(response.json()))result=response.json()print(type(response.json())=="dict")if type(response.json())==type({}):print(response.json())with open('type{}.json'.format(i),'w',encoding='utf-8') as f:json.dump(result,f,ensure_ascii=False)f.close()except Exception as ex:print(ex)

将数据写入到数据库中的write_data.py

  import jsonimport pymysqlfor index in range(1,12):try:with open('myspiders/type{}.json'.format(index), 'r', encoding='utf-8') as f:data = json.load(f)print(data["name"])conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='',db='orsp', charset='utf8')# 建立游标对象cursor = conn.cursor()# 先查出name对应的idsql_id_Byname = 'SELECT id FROM product_type WHERE product_type="{}"'.format(data["name"])cursor.execute(sql_id_Byname)res_id = cursor.fetchone()res_id = res_id[0]print(res_id)# 再插入二级类型for i in range(len(data["sCate"])):sql_insert_two = "INSERT INTO `product_type_two` (`product_type_one_id`, `type_two_name`) VALUES ('{0}', '{1}')"two_type = data["sCate"][i]["name"]print("two_type", two_type)sql_insert_two = sql_insert_two.format(res_id, two_type)print(sql_insert_two)cursor.execute(sql_insert_two)insert_id = conn.insert_id()print("insert_id", insert_id)three_data = data["sCate"][i]["tCategorys"]for j in three_data:print(j["name"])sql_insert_three = "INSERT INTO `product_type_three` (`product_type_two_id`, `type_three_name`) VALUES ({0}, '{1}')"sql_insert_three = sql_insert_three.format(insert_id, j["name"])print(sql_insert_three)cursor.execute(sql_insert_three)conn.commit()except Exception as ex:print(ex)

使用requests爬取易物天下商品类型实战.md相关推荐

【Python】爬取国外购物网站商品信息实战
1.项目目录 ----Project ------venv --------main.py --------brickseek.py --------database.py 2.main.py imp ...
Python requests爬取淘宝商品信息
作者:achen 联系方式:wh909077093 这里记一下大概的思路以及实现方法,有基础的小伙伴看了基本就能实现了,如果有业务需要可以联系我哈哈哈哈哈哈本文代码参考猪哥66的思路项目内容指定 ...
python使用requests库爬取淘宝指定商品信息
python使用requests库爬取淘宝指定商品信息在搜索栏中输入商品通过F12开发者工具抓包我们知道了商品信息的API,同时发现了商品数据都以json字符串的形式存储在返回的html内解析u ...
python爬虫爬取当当网的商品信息
python爬虫爬取当当网的商品信息一.环境搭建二.简介三.当当网网页分析 1.分析网页的url规律 2.解析网页html页面书籍商品html页面解析其他商品html页面解析四.代码实现 ...
网络爬虫爬取淘宝页面商品信息
网络爬虫爬取淘宝页面商品信息最近在MOOC上看嵩老师的网络爬虫课程,按照老师的写法并不能进行爬取,遇到了一个问题,就是关于如何"绕开"淘宝登录界面,正确的爬取相关信息.通过百度找 ...
17-分析Ajax请求--爬取易烊千玺微博【以及简单的数据分析】
目的:爬取易烊千玺的微博,包括:点赞数.评论数.转发数.发布时间.微博正文长度.每次的微博id 结果呈现:屏幕显示爬取完成:相应的文件夹的生成注:此时进行简单的数据分析-------文本长度.点赞数 ...
python爬取数据分析淘宝商品_python爬取并分析淘宝商品信息
python爬取并分析淘宝商品信息 Tip:本文仅供学习与交流,切勿用于非法用途!!! 背景介绍有个同学问我:"XXX,有没有办法搜集一下淘宝的商品信息啊,我想要做个统计".于是 ...
python3 [爬虫实战] selenium + requests 爬取安居客
很简单,这里是根据网友的求助爬取的安居客上的一个页面的全部地区名称跟链接因为她用的scrapy框架,感觉有些大才小用了,所以就直接用了一个requests库,selenium 和xpath进行一整页 ...
爬取淘宝任意商品数据，你上你也行
文章目录构造url 分析页面结构爬取多页数据最后构造url 第一页url https://s.taobao.com/search?q="面膜" 第二页url https:/ ...

使用requests爬取易物天下商品类型实战.md

使用requests爬取易物天下商品类型实战.md相关推荐

最新文章

热门文章