Python爬虫系列之爬取某奢侈品小程序店铺商品数据

小程序爬虫接单、app爬虫接单、网页爬虫接单、接口定制、网站开发、小程序开发> 点击这里联系我们 <

微信请扫描下方二维码

代码仅供学习交流,请勿用于非法用途

一、准备数据库

create database zr;use zr;# 商品表
create table zr_goodslist(id int primary key auto_increment comment 'id',pid varchar(30) unique comment 'pid',sku varchar(30) default null comment 'sku',name varchar(50) default null comment 'name',sellingPoint varchar(200) default null comment 'sellingPoint',descption text default null comment 'desc',mainimg text default null comment 'mainimg',imageList text default null comment 'imageList',video text default null comment 'video',brand varchar(30) default null comment 'brand',status varchar(8) default null comment 'status',stock varchar(10) default null comment 'stock',source varchar(10) default null comment 'source',refDetail text default null comment 'refDetail',convert_size varchar(100) default null comment 'convert_size',marketPrice varchar(15) default null comment 'marketPrice',salePrice varchar(15) default null comment 'salePrice',price varchar(15) default null comment 'price',discount varchar(15) default null comment 'discount',marketingDesc varchar(300) default null comment 'marketingDesc',grade varchar(10) default null comment 'grade',brandType varchar(15) default null comment 'brandType',categoryOne varchar(20) default null comment 'categoryOne',categoryTwo varchar(20) default null comment 'categoryTwo',categoryThree varchar(20) default null comment 'categoryThree',viewNumStatus varchar(10) default null comment 'viewNumStatus',openBargain varchar(30) default null comment 'openBargain',directDesc text default null comment 'directDesc',degree text default null comment 'degree',degreeDesc text default null comment 'degreeDesc',degreeExt text default null comment 'degreeExt',coefficient text default null comment 'coefficient',firstPutOn varchar(50) default null comment 'firstPutOn',proc_view_num varchar(15) default null comment 'proc_view_num',correctNum varchar(15) default null comment 'correctNum',bargainBasePrice varchar(15) default null comment 'bargainBasePrice',onSale varchar(10) default null comment 'onSale',onSaleCountDown varchar(15) default null comment 'onSaleCountDown',bargainLock varchar(50) default null comment 'bargainLock',bargainDownTime varchar(35) default null comment 'bargainDownTime',isBargain varchar(10) default null comment 'isBargain',bargainPrice varchar(15) default null comment 'bargainPrice',bargainNum varchar(15) default null comment 'bargainNum',color_forming varchar(30) default null comment 'color_forming',tile_size varchar(30) default null comment 'tile_size',overall_weight varchar(30) default null comment 'overall_weight',size_prompt varchar(30) default null comment 'size_prompt',defect text default null comment 'defect',style text default null comment 'style',accessories text default null comment 'accessories',material text default null comment 'material',lengths text default null comment 'lengths',main_material text default null comment 'main_material',sizes text default null comment 'sizes',fabric text default null comment 'fabric'
)engine=INNODB charset=utf8;

二、代码实现

# -*- coding:utf-8 -*-
import requests
from queue import Queue
import threading
import json
import MySQLdb
import configparsertotals = 0
cf = configparser.ConfigParser()
try:cf.read("config.ini")
except Exception as e:print("程序目录下不存在config.ini配置文件~")exit(0)def getConf(sec, key):try:return cf.get(sec, key)except Exception as e:print("未得到以下配置:" + sec + " - " + key)exit(0)# -------------------------------------------------
threadNums = int(getConf("app-sys", "threadNums"))
retry = 3
timeout = 20
# 数据库账号
mysql_user = getConf("Mysql-Database", "user")
# 数据库密码
mysql_password = getConf("Mysql-Database", "password")
# 数据库名称
mysql_database = getConf("Mysql-Database", "database")
# 表名称
mysql_table = getConf("Mysql-Database", "table")
headers = {"User-Agent": "Mozilla/5.0 (Linux; Android 5.1.1; DUK-AL20 Build/LMY48Z; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/52.0.2743.100 Safari/537.36 MicroMessenger/7.0.10.1580(0x27000A59) Process/appbrand3 NetType/WIFI Language/zh_CN ABI/arm32","content-type": "application/json;charset=utf-8",
}
host = "https://img.*******.com/"
attrsList = []class zrSpider(threading.Thread):def __init__(self, brandQueue, index, *args, **kwargs):super(zrSpider, self).__init__(*args, **kwargs)self.brandQueue = brandQueueself.index = indexdef getGoodsList(self, brandId, page):url = "https://search.*******.com/V4.7.0/product/list"data = {"page": page,"pageSize": 20,"sort": "","ppath": "4:" + str(brandId),"newShare": 0,"selfbiz": 1,"version": "5.3.0","debug": "false","mt": "WX-micro","inWechat": 1,"from": "micro","deviceId": "deviceId"}resp = postHtml(url, data)if resp:try:return resp['data']['list']except Exception as e:passreturndef getGoodsDetail(self, id):global attrsListurl = "https://api.*******.com/V5.3.0/product/newDetail"data = {"id": str(id),"version": "5.3.0","debug": "false","mt": "WX-micro","inWechat": 1,"from": "micro","deviceId": "deviceId"}resp = postHtml(url, data)if resp:try:if str(resp['code']) != "100000":returnexcept Exception as e:returndetail = {}productAttr = {}# brand = {}try:detail = resp['data']['detail']productAttr = resp['data']['productAttr']# brand = resp['data']['brand']except Exception as e:return# try:#     for product in productAttr:#         attrsList.append(product['name'])#     print(list(set(attrsList)))#     print(len(list(set(attrsList))))# except Exception as e:#     pass# returntry:goods = []try:goods.append(detail['id'])except Exception as e:goods.append("")try:goods.append(detail['sku'])except Exception as e:goods.append("")try:goods.append(detail['name'])except Exception as e:goods.append("")try:goods.append(detail['sellingPoint'])except Exception as e:goods.append("")try:goods.append(detail['desc'])except Exception as e:goods.append("")try:goods.append(host + detail['imageList'][0])except Exception as e:goods.append("")try:imageList = detail['imageList']imgs = []for image in imageList:imgs.append(host + image)goods.append(str(imgs).replace("'", "\""))except Exception as e:goods.append("")try:goods.append(detail['video'])except Exception as e:goods.append("")try:goods.append(detail['brand'])except Exception as e:goods.append("")try:goods.append(detail['status'])except Exception as e:goods.append("")try:goods.append(detail['stock'])except Exception as e:goods.append("")try:goods.append(detail['source'])except Exception as e:goods.append("")try:goods.append(detail['refDetail'])except Exception as e:goods.append("")try:goods.append(detail['convert_size'])except Exception as e:goods.append("")try:goods.append(detail['marketPrice'])except Exception as e:goods.append("")try:goods.append(detail['salePrice'])except Exception as e:goods.append("")try:goods.append(detail['price'])except Exception as e:goods.append("")try:goods.append(detail['discount'])except Exception as e:goods.append("")try:goods.append(detail['marketingDesc'])except Exception as e:goods.append("")try:goods.append(detail['grade'])except Exception as e:goods.append("")try:goods.append(detail['brandType'])except Exception as e:goods.append("")try:goods.append(detail['categoryOne'])except Exception as e:goods.append("")try:goods.append(detail['categoryTwo'])except Exception as e:goods.append("")try:goods.append(detail['categoryThree'])except Exception as e:goods.append("")try:goods.append(detail['viewNumStatus'])except Exception as e:goods.append("")try:goods.append(detail['openBargain'])except Exception as e:goods.append("")try:goods.append(detail['directDesc'])except Exception as e:goods.append("")try:goods.append(detail['degree'])except Exception as e:goods.append("")try:goods.append(detail['degreeDesc'])except Exception as e:goods.append("")try:goods.append(detail['degreeExt'])except Exception as e:goods.append("")try:goods.append(detail['coefficient'])except Exception as e:goods.append("")try:goods.append(detail['firstPutOn'])except Exception as e:goods.append("")try:goods.append(detail['proc_view_num'])except Exception as e:goods.append("")try:goods.append(detail['correctNum'])except Exception as e:goods.append("")try:goods.append(detail['bargainBasePrice'])except Exception as e:goods.append("")try:goods.append(detail['onSale'])except Exception as e:goods.append("")try:goods.append(detail['onSaleCountDown'])except Exception as e:goods.append("")try:goods.append(detail['bargainLock'])except Exception as e:goods.append("")try:goods.append(detail['bargainDownTime'])except Exception as e:goods.append("")try:goods.append(detail['isBargain'])except Exception as e:goods.append("")try:goods.append(detail['bargainPrice'])except Exception as e:goods.append("")try:goods.append(detail['bargainNum'])except Exception as e:goods.append("")return goodsexcept Exception as e:returnreturndef pipLine(self, data):print("------------------------- insert ------------------------- ")print(data)print("---------------------------------------------------------- ")try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database,charset='utf8')cursor = conn.cursor()cursor.execute("insert " + mysql_table + "(pid, sellingPoint, descption, mainimg, imageList, video, brand, status, stock, source, refDetail, convert_size, marketPrice, salePrice, price, discount, marketingDesc, grade,categoryTwo, categoryThree, viewNumStatus, openBargain, directDesc, degree, degreeDesc, degreeExt, coefficient, firstPutOn, proc_view_num, correctNum, bargainBasePrice, onSale, onSaleCountDown, bargainLock, bargainPrice, color_forming, tile_size, overall_weight, size_prompt, defect, style, accessories, material, lengths, main_material, sizes, fabric) values('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s')" % (str(data[0]), str(data[1]), str(data[2]), str(data[3]), str(data[4]), str(data[5]), str(data[6]), str(data[7]), str(data[8]), str(data[9]), str(data[10]), str(data[11]), str(data[12]), str(data[13]), str(data[14]), str(data[15]), str(data[16]), str(data[17]), str(data[18]), str(data[19]), str(data[20]), str(data[21]), str(data[22]), str(data[23]), str(data[24]), str(data[25]), str(data[26]), str(data[27]), str(data[28]), str(data[29]), str(data[30]), str(data[31]), str(data[32]), str(data[33]), str(data[34]), str(data[35]), str(data[36]), str(data[37]), str(data[38]), str(data[39]), str(data[40]), str(data[41]), str(data[42]), str(data[43]), str(data[44]), str(data[45]), str(data[46]), str(data[47]), str(data[48]), str(data[49]), str(data[50]), str(data[51]), str(data[52]), str(data[53])))conn.commit()except Exception as e:print(e)passdef getTotalPage(self, brandId):url = "https://search.*******.com/V4.7.0/product/list"data = {"page": 1,"pageSize": 20,"sort": "","ppath": "4:" + str(brandId),"newShare": 0,"selfbiz": 1,"version": "5.3.0","debug": "false","mt": "WX-micro","inWechat": 1,"from": "micro","deviceId": "deviceId"}resp = postHtml(url, data)if resp:try:count = int(resp['data']['count'])return count // 20 if count % 20 == 0 else (count // 20) + 1except Exception as e:passreturn 1def checkGoodsExists(self, pid):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database,charset='utf8')cursor = conn.cursor()cursor.execute("select * from " + mysql_table + " where pid = '%s'" % str(pid))return len(cursor.fetchall()) > 0except Exception as e:print(e)passreturn Falsedef update(self, data):print("------------------------- update ------------------------- ")print(data)print("---------------------------------------------------------- ")try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database,charset='utf8')cursor = conn.cursor()cursor.execute("update " + mysql_table + " set sku = '%s', name = '%s', sellingPoint = '%s', descption = '%s', *****= '%s', imageList = '%s', video = '%s', brand = '%s', status = '%s', stock = '%s', source = '%s', refDetail = '%s', convert_size = '%s', marketPrice = '%s', salePrice = '%s', ***** = '%s', discount = '%s', marketingDesc = '%s', grade = '%s', brandType = '%s', categoryOne = '%s', categoryTwo = '%s', categoryThree = '%s', viewNumStatus = '%s', openBargain = '%s', directDesc = '%s', degree = '%s', degreeDesc = '%s', degreeExt = '%s', coefficient = '%s', firstPutOn = '%s', *****= '%s', correctNum = '%s', bargainBasePrice = '%s', onSale = '%s', onSaleCountDown = '%s', bargainLock = '%s', bargainDownTime = '%s', isBargain = '%s', bargainPrice = '%s', bargainNum = '%s', color_forming = '%s', tile_size = '%s', *****= '%s', size_prompt = '%s', defect = '%s', style = '%s', accessories = '%s', material = '%s', lengths = '%s', fabric = '%s' where pid = '%s'" % (str(data[1]), str(data[2]), str(data[3]), str(data[4]), str(data[5]), str(data[6]), str(data[7]), str(data[8]), str(data[9]), str(data[10]), str(data[11]), str(data[12]), str(data[13]), str(data[14]), str(data[15]), str(data[16]), str(data[17]), str(data[18]), str(data[19]), str(data[20]), str(data[21]), str(data[22]), str(data[23]), str(data[24]), str(data[25]), str(data[26]), str(data[27]), str(data[28]), str(data[29]), str(data[30]), str(data[31]), str(data[32]), str(data[33]), str(data[34]), str(data[35]), str(data[36]), str(data[37]), str(data[38]), str(data[39]), str(data[40]), str(data[41]), str(data[42]), str(data[43]), str(data[44]), str(data[45]), str(data[46]), str(data[47]), str(data[48]), str(data[49]), str(data[50]), str(data[51]), str(data[52]), str(data[53]), str(data[0])))conn.commit()except Exception as e:passdef run(self):print("线程:%d 启动~" % self.index)while True:if self.brandQueue.empty():breakbrandQueue = self.brandQueue.get()brand_id = str(brandQueue['id'])totalPage = self.getTotalPage(brand_id)for page in range(1, totalPage + 1):goodsList = self.getGoodsList(brand_id, page)if goodsList and len(goodsList) > 0:for goods in goodsList:goodsId = goods['id']datas = self.getGoodsDetail(goodsId)exists = self.checkGoodsExists(goodsId)if exists:# 更新self.update(datas)else:self.pipLine(datas)def postHtml(url, data):for i in range(retry):try:resp = requests.post(url, data=json.dumps(data), json=data, headers=headers, timeout=timeout)return json.loads(resp.content.decode("utf-8"))except Exception as e:passreturndef getHtml(url):for i in range(retry):try:resp = requests.get(url, headers=headers, timeout=timeout)return json.loads(resp.content.decode("utf-8"))except Exception as e:passreturndef getBrandQueue():brandQueue = Queue(0)url = "https://api.*******.com/V5.3.0/site/currentBrand"data = {"version": "5.3.0","debug": "false","mt": "WX-micro","inWechat": 1,"from": "micro","deviceId": "deviceId"}resp = postHtml(url, data)if resp:brandList = []try:brandList = resp['data']['list']except Exception as e:returnfor brand in brandList:brandQueue.put(brand)return brandQueuedef main():print("初始化爬虫~")brandQueue = getBrandQueue()print("类目获取完毕~")for i in range(threadNums):z = zrSpider(brandQueue, i)z.start()if __name__ == '__main__':main()

小程序爬虫接单、app爬虫接单、网页爬虫接单、接口定制、网站开发、小程序开发 > 点击这里联系我们 <

Python爬虫系列之爬取某奢侈品小程序店铺商品数据相关推荐

  1. Python爬虫系列之爬取微信公众号新闻数据

    Python爬虫系列之爬取微信公众号新闻数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发 > 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学习交流 ...

  2. Python爬虫系列之爬取某优选微信小程序全国店铺商品数据

    Python爬虫系列之爬取某优选微信小程序全国商品数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发 > 点击这里联系我们 < 微信请扫描下方二维码 代码仅 ...

  3. Python爬虫系列之爬取某社区团微信小程序店铺商品数据

    Python爬虫系列之爬取某社区团微信小程序店铺商品数据 如有问题QQ请> 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学习交流,请勿用于非法用途 数据库仅用于去重使用,数据主要存 ...

  4. [Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(三) —— 数据的持久化——使用MongoDB存储爬取的数据

    上一篇:[Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(二) -- 编写一个基本的 Spider 爬取微博用户信息 在上一篇博客中,我们已经新建了一个爬虫应用,并简单实现了爬取一位微 ...

  5. 为了部落 来自艾泽拉斯勇士的python爬虫学习心得 爬取大众点评上的各种美食数据并进行数据分析

    为了希尔瓦娜斯 第一个爬虫程序 csgo枪械数据 先上代码 基本思想 问题1 问题2 爬取大众点评 URL分析 第一个难题 生成csv文件以及pandas库 matplotlib.pyplot库 K- ...

  6. Python 爬虫系列:爬取全球机场信息

    前言 最近公司需要全球机场信息,用来做一些数据分析.刚好发现有个网站上有这个信息,只是没有机场的经纬度信息,不过有了机场信息,经纬度信息到时候我们自己补上去就行 网站元素分析 我们找到了有这些信息的网 ...

  7. Python爬虫系列:爬取小说并写入txt文件

    导语: 哈喽,哈喽~都说手机自带的浏览器是看小说最好的一个APP,不须要下载任何软件,直接百度就ok了. 但是小编还是想说,如果没有网,度娘还是度娘吗?能把小说下载成一个.txt文件看不是更香吗?这能 ...

  8. Python爬虫系列之爬取猫眼电影,没办法出门就补一下往期电影吧

    前言 今天给大家介绍利用Python爬取并简单分析猫眼电影影评.让我们愉快地开始吧~ 开发工具 Python版本:3.6.4 相关模块: requests模块: pyecharts模块: jieba模 ...

  9. python爬虫(八、爬取图片社的小姐姐图片并下载)

    爬取网页 Ⅰ.先抓取下这个网页,套模板就好了\color{Red}Ⅰ.先抓取下这个网页,套模板就好了Ⅰ.先抓取下这个网页,套模板就好了 def ask(url):head = {"User- ...

最新文章

  1. 秋色园QBlog技术原理解析:UrlRewrite之无后缀URL原理(三)
  2. Linux 网络路由介绍
  3. JFreeChart项目实例
  4. 【Linux】一步一步学Linux——Linux文件属性详解(28)
  5. 洛谷 - P4717 【模板】快速莫比乌斯/沃尔什变换 (FMT/FWT)
  6. solr7.4.0+mysql+solrj(简而优美)
  7. Redis的常用命令及数据类型
  8. Python+matplotlib绘制三维图形5个精选案例
  9. 草图大师 2019 破解版|草图大师Sketchup pro 2019中文破解版64位下载 v19.0(附Sketchup 2019破解补丁)
  10. 2021年度学习总结
  11. JAVA学习3-抽象类、内部类、数组、Object、System、String、基本包装类型
  12. uwe5622 uwe5621ds 紫光展锐 wifi 移植的几个关键点:
  13. 基于MATLAB的电弧仿真模型(Mayr/Cassie 电弧模型)
  14. 学Java开发到底能做什么工作?
  15. SpringCloud:统一网关Gateway
  16. 微信小程序----对接OneNet平台(测试版)
  17. 网线的制作方法及步骤
  18. python读取二维数组的行列数_Python获取二维数组的行列数的2种方法
  19. 特斯拉、华为们要用软件重新定义汽车?
  20. 组装电脑配置推荐2022

热门文章

  1. java nio rewind_java.nio.ByteBuffer中的flip()、rewind()、compact()等方法的使用和区别
  2. bootstrap table表头错位,火狐浏览器下滚动条挤像素问题解决方案。
  3. context 简介
  4. Google网站提交指南
  5. Eclipse中Ant的使用
  6. 数字视音频处理知识点小结
  7. 【机器学习笔记14】softmax多分类模型【下篇】从零开始自己实现softmax多分类器(含具体代码与示例数据集)
  8. 2019年_BATJ大厂面试题总结-华为篇
  9. 怎样成为一个高级JAVA工程师
  10. Julia是什么?为什么突然这么火?