python使用Selenium爬取淘宝商品信息

由于淘宝对自动化工具进行了识别，直接进入登录页面滑动二维码一直会报错，所以采取了曲线救国的方式，通过用微博账号来登录淘宝。刚自学《Python3网络爬虫开发实战》，和里面的代码有一点点区别。废话不多说，直接上代码。

#coding=utf-8
"""
__author__ = zenghaisheng"""
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from urllib import quote
from bs4 import BeautifulSoupbrowser = webdriver.Chrome()
wait = WebDriverWait(browser,10)
KEYTWORD = "white something you want to search"
WEIBO_NAME = "white your weibo name"
WEIBO_PASSWOORD = 'white your weibo password'def index_page(page):print('正在爬取第{}页'.format(page))try:url = "https://s.taobao.com/search?q=" +quote(KEYTWORD)browser.get(url)#点击切换密码登陆a_element = browser.find_element_by_class_name('login-switch')a_element.click()#跳转到微博登陆页面weibo_login = browser.find_element_by_class_name('weibo-login')weibo_login.click()name_input = browser.find_element(By.NAME,'username')name_input.send_keys(WEIBO_NAME)password_input = browser.find_element(By.NAME,'password')password_input.send_keys(WEIBO_PASSWOORD)submit = browser.find_element_by_class_name('W_btn_g')submit.send_keys(Keys.ENTER)#登陆成功，跳转回淘宝wait = WebDriverWait(browser,10)if page > 1:input_page = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,'.m-page .form > input')))sumbit_go_page = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,'.m-page .form .J_Submit')))input_page.clear()input_page.send_keys(page)sumbit_go_page.send_keys(Keys.ENTER)wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,'.m-itemlist .items .item')))goods_msgs =  get_goods_msg()return goods_msgsexcept Exception as e:print(e)def get_goods_msg():html = browser.page_sourcesoup = BeautifulSoup(html,'lxml')goods_list = soup.find_all(class_='J_MouserOnverReq')for i in goods_list:i_soup = BeautifulSoup(str(i),'lxml')#商品显示图链接data_imgurl = 'https:'+i_soup.find(class_='J_ItemPic img')["data-src"]#商品链接data_href = 'https:'+i_soup.find(class_='pic-link')["data-href"]#商品标题data_title = i_soup.find(class_='title').get_text().strip()#商品价格data_price = i_soup.select('.ctx-box .price strong')[0].get_text()#多少人付款data_pay_peoples = i_soup.find(class_='deal-cnt').get_text().replace("人付款",'')yield dict(data_imgurl = data_imgurl,data_href = data_href,data_title = data_title,data_price = data_price,data_pay_peoples = data_pay_peoples,)if __name__ == "__main__":#填写你想搜索的第几页数page_num = 1goods_msgs = index_page(page_num)for good_msg in goods_msgs:print(good_msg)

python使用Selenium爬取淘宝商品信息相关推荐

Python爬虫+selenium——爬取淘宝商品信息和数据分析
浏览器驱动点击下载chromedrive .将下载的浏览器驱动文件chromedriver丢到Chrome浏览器目录中的Application文件夹下,配置Chrome浏览器位置到PATH环境. 需 ...
python爬虫——用selenium爬取淘宝商品信息
python爬虫--用selenium爬取淘宝商品信息 1.附上效果图 2.淘宝网址https://www.taobao.com/ 3.先写好头部 browser = webdriver.Chrome ...
利用Selenium爬取淘宝商品信息
文章来源:公众号-智能化IT系统. 一. Selenium和PhantomJS介绍 Selenium是一个用于Web应用程序测试的工具,Selenium直接运行在浏览器中,就像真正的用户在操作一样. ...
python+scrapy简单爬取淘宝商品信息
python结合scrapy爬取淘宝商品信息一.功能说明: 已实现功能: 通过scrapy接入selenium获取淘宝关键字搜索内容下的商品信息. 待扩展功能: 爬取商品中的全部其他商品信息. 二. ...
爬虫学习笔记——Selenium爬取淘宝商品信息并保存
在使用selenium来模拟浏览器操作,抓取淘宝商品信息前,先完成一些准备工作. 准备工作:需要安装selenium,pyquery,以及Chrome浏览器并配置ChromeDriver. 安装sel ...
使用python selenium爬取淘宝商品信息自动登录淘宝和爬取某一宝贝的主图，属性图和详情图等等
selenium作为一个自动化测试工具非常好用,谁用谁知道啊. 先说如何登录淘宝,淘宝现在直接用会员名和密码登录会有滑块验证,找了网上说的几种方法和自己尝试了一番效果还是不太理想,实测过程中,即使滑块 ...
python+selenium爬取淘宝商品信息+淘宝自动登录——爬虫实战
1.前言继续学习爬虫内容,这回是以selenium模拟操作进行抓取,其中有几个需要登陆的注意事项. 2.自动登陆+查找页面由于现在淘宝的反爬机制,需要登陆才好下一步操作.在搜索输入后页面会调入登陆 ...
Python，自己修改的爬取淘宝网页的代码修改Python爬虫，爬取淘宝商品信息也不报错，也不输出信息的错误
代码部分: 下面是正确的: import requests import redef getHTMLText(url):try:r = requests.get(url, timeout = 30)r ...
Python + selenium 爬取淘宝商品列表及商品评论 2021-08-26
Python + selenium 爬取淘宝商品列表及商品评论[2021-08-26] 主要内容登录淘宝获取商品列表获取评论信息存入数据库需要提醒主要内容通过python3.8+ sel ...

python使用Selenium爬取淘宝商品信息

python使用Selenium爬取淘宝商品信息相关推荐

最新文章

热门文章