暑假学习 Python爬虫基础（4）

学习的一些基础也完成的差不多了，下面就尽量自己来完成一下对百度文库文章的爬取，尽量自己自主完成

还有就是手机软件爬虫的实现

百度文库文章的爬取

手机端的反爬手段少一点，可以修改头部，让其实现手机端网页的访问

from selenium import webdriveroptions = webdriver.ChromeOptions()
options.add_argument('user-agent="Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19"')
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://wenku.baidu.com/view/aa31a84bcf84b9d528ea7a2c.html')

爬取百度文库的过程中，到了点击继续阅读的部分时，报错了，因为上面有一个分块，覆盖在其上面，导致其无法点击，在网上找了许多的东西都没有尝试成功，正是这样的过程才发现了很多自己的不足。所以说还是要多实践呀！问题一直没有得到解决，所以我一直就卡在了下面这一段，也对这个项目失去了一些兴趣，所以暂时先放一放。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutExceptiondef search():options = webdriver.ChromeOptions()options.add_argument('user-agent="Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19"')driver = webdriver.Chrome(chrome_options=options)Wait = WebDriverWait(driver,10)driver.get('https://wenku.baidu.com/view/aa31a84bcf84b9d528ea7a2c.html')Con  = Wait.until(EC.element_to_be_clickable((By.XPATH,'/html/body/div[2]/div[2]/div[6]/div[2]/div[2]/div[1]/div')))Con.click()Con2 = driver.find_element_by_css_selector('body > div.sf-edu-wenku-vw-container > div.sfa-body > div.sf-edu-wenku-id-container.wrap.rtcsPlayer.rtcs-container.reader-rtcs > div.pagerwg-root')Con3 =  Con2.find_element_by_css_selector('div.pagerwg-loadSucc')driver.execute_script("arguments[0].click();", Con3)if __name__ == '__main__':search()

于是就来尝试一下手机app的爬取

urlretrieve 直接将远程数据保存到本地

常用的参数有 urlretrieve(url = target_utl ,filename = target_filename)

跟着博客的内容完成了对英雄图片的抓起，但是博客没有对完成全部代码的讲解，于是在后面文章就将其后面的内容完善

#-*- coding: UTF-8 -*-
import requests
import os
from urllib.request import urlretrievedef downlaod_images(heros_url,headers):req  =  requests.get(url=heros_url,headers = headers).json()heros_num = len(req['list'])print(heros_num)heros_images_path = "heros_images"for each_hero in req['list']:heros_images_url = each_hero['cover']heros_name = each_hero['name']filename = heros_images_path + '/' + heros_name + '.jpg'if heros_images_path not  in os.listdir():os.makedirs(heros_images_path)urlretrieve(url = heros_images_url, filename = filename)if __name__ == '__main__':headers = {'Accept-Charset': 'UTF-8','Accept-Encoding': 'gzip,deflate','User-Agent': 'Dalvik/2.1.0 (Linux; U; Android 6.0.1; MI 5 MIUI/V8.1.6.0.MAACNDI)','X-Requested-With': 'XMLHttpRequest','Content-type': 'application/x-www-form-urlencoded','Connection': 'Keep-Alive','Host': 'gamehelper.gm825.com'}heros_url = "http://gamehelper.gm825.com/wzry/hero/list?channel_id=90009a&app_id=h9044j&game_id=7622&game_name=%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80&vcode=12.0.3&version_code=1203&cuid=2654CC14D2D3894DBF5808264AE2DAD7&ovr=6.0.1&device=Xiaomi_MI+5&net_type=1&client_id=1Yfyt44QSqu7PcVdDduBYQ%3D%3D&info_ms=fBzJ%2BCu4ZDAtl4CyHuZ%2FJQ%3D%3D&info_ma=XshbgIgi0V1HxXTqixI%2BKbgXtNtOP0%2Fn1WZtMWRWj5o%3D&mno=0&info_la=9AChHTMC3uW%2BfY8%2BCFhcFw%3D%3D&info_ci=9AChHTMC3uW%2BfY8%2BCFhcFw%3D%3D&mcc=0&clientversion=&bssid=VY%2BeiuZRJ%2FwaXmoLLVUrMODX1ZTf%2F2dzsWn2AOEM0I4%3D&os_level=23&os_id=dc451556fc0eeadb&resolution=1080_1920&dpi=480&client_ip=192.168.0.198&pdunid=a83d20d8"downlaod_images(heros_url,headers)

接下来就是自己对功能的实现，没有很完善

#-*- coding: UTF-8 -*-
import requests
import os
from urllib.request import urlretrievedef find_equip(heros_id,headers):target_url = "http://gamehelper.gm825.com/wzry/hero/detail?hero_id= {0} &channel_id=90014a&app_id=h9044j&game_id=7622&game_name=%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80&vcode=13.0.4.0&version_code=13040&cuid=B57DB402FC4611D6B605F06B82603AFF&ovr=8.0.0&device=Xiaomi_Mi+Note+2&net_type=1&client_id=BClcz%2BYVz73G6DcA7pzkfA%3D%3D&info_ms=&info_ma=Olgy8v3e7w9zofX48IdfkBrSUWcVbZ0pNco9eXe4ROk%3D&mno=0&info_la=aD63pMVCy5E9Cm1pRt%2F4XQ%3D%3D&info_ci=aD63pMVCy5E9Cm1pRt%2F4XQ%3D%3D&mcc=0&clientversion=13.0.4.0&bssid=IXVsHTRQrp2vDPpI%2BZp6NI1oGKJ7uzg0Eq6QSrIGf08%3D&os_level=26&os_id=8f698313c7903c25&resolution=1080_1920&dpi=440&client_ip=192.168.1.3&pdunid=8c3195cd".format(heros_id)req = requests.get(url = target_url, headers = headers).json()for each_one in req['info']['equip_choice']:print(each_one['title'])for each in each_one['list']:equip_list(each['equip_id'],headers)def equip_list(equip_id,headers):target_url = 'http://gamehelper.gm825.com/wzry/equip/list?channel_id=90014a&app_id=h9044j&game_id=7622&game_name=%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80&vcode=13.0.4.0&version_code=13040&cuid=B57DB402FC4611D6B605F06B82603AFF&ovr=8.0.0&device=Xiaomi_Mi+Note+2&net_type=1&client_id=BClcz%2BYVz73G6DcA7pzkfA%3D%3D&info_ms=&info_ma=Olgy8v3e7w9zofX48IdfkBrSUWcVbZ0pNco9eXe4ROk%3D&mno=0&info_la=aD63pMVCy5E9Cm1pRt%2F4XQ%3D%3D&info_ci=aD63pMVCy5E9Cm1pRt%2F4XQ%3D%3D&mcc=0&clientversion=13.0.4.0&bssid=IXVsHTRQrp2vDPpI%2BZp6NI1oGKJ7uzg0Eq6QSrIGf08%3D&os_level=26&os_id=8f698313c7903c25&resolution=1080_1920&dpi=440&client_ip=192.168.1.3&pdunid=8c3195cd'req= requests.get(url = target_url,headers = headers).json()for each_one in req['list']:if each_one['equip_id'] == equip_id:print(each_one['name'])def list_heros(heros_url,headers):req  =  requests.get(url=heros_url,headers = headers).json()heros_num = len(req['list'])print(heros_num)for each_hero in req['list']:heros_name = each_hero['name']heros_number = each_hero['hero_id']print(heros_name + heros_number)if __name__ == '__main__':headers = {'Accept-Charset': 'UTF-8','Accept-Encoding': 'gzip,deflate','User-Agent': 'Dalvik/2.1.0 (Linux; U; Android 6.0.1; MI 5 MIUI/V8.1.6.0.MAACNDI)','X-Requested-With': 'XMLHttpRequest','Content-type': 'application/x-www-form-urlencoded','Connection': 'Keep-Alive','Host': 'gamehelper.gm825.com'}heros_url = "http://gamehelper.gm825.com/wzry/hero/list?channel_id=90009a&app_id=h9044j&game_id=7622&game_name=%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80&vcode=12.0.3&version_code=1203&cuid=2654CC14D2D3894DBF5808264AE2DAD7&ovr=6.0.1&device=Xiaomi_MI+5&net_type=1&client_id=1Yfyt44QSqu7PcVdDduBYQ%3D%3D&info_ms=fBzJ%2BCu4ZDAtl4CyHuZ%2FJQ%3D%3D&info_ma=XshbgIgi0V1HxXTqixI%2BKbgXtNtOP0%2Fn1WZtMWRWj5o%3D&mno=0&info_la=9AChHTMC3uW%2BfY8%2BCFhcFw%3D%3D&info_ci=9AChHTMC3uW%2BfY8%2BCFhcFw%3D%3D&mcc=0&clientversion=&bssid=VY%2BeiuZRJ%2FwaXmoLLVUrMODX1ZTf%2F2dzsWn2AOEM0I4%3D&os_level=23&os_id=dc451556fc0eeadb&resolution=1080_1920&dpi=480&client_ip=192.168.0.198&pdunid=a83d20d8"find_equip(4,headers)

暑假学习 Python爬虫基础（4）相关推荐

Day2：python爬虫基础学习（大嘘）
Day2:python爬虫基础学习(大嘘)) 教材&参考: 学习过程 Sublime配置教程下载&安装语言(设置中文) 设置字体/配色配置Python环境使用python官方编 ...
学python需要学数据库吗-学习Python爬虫前，你必须知道的一些工具！
原标题:学习Python爬虫前,你必须知道的一些工具! 许多小伙伴在学习了一段时间的Python后,开始上手爬虫项目了,作为一个总算掌握了基础,开始向上进阶的Python小白,在做爬虫的时候肯定会遇到 ...
如何自学python爬虫-小白如何快速学习Python爬虫？
原标题:小白如何快速学习Python爬虫? 很多同学想学习爬虫 ,对于小白来说,爬虫可能是一件非常复杂.技术门槛很高的事情.而且爬虫是入门 Python 最好的方式,没有之一. 我们可以通过爬虫获取 ...
如何自学python爬虫-怎样入门学习Python爬虫？
怎样入门学习Python爬虫? 1.掌握Python编程能基础想要学习爬虫,首先要充分掌握Python编程技术相关的基础知识.爬虫其实就是遵循一定的规则获取数据的过程,所以在学习Python知识的过 ...
python基础代码库-python爬虫基础教程：requests库（二）代码实例
get请求简单使用 import requests ''' 想要学习Python?Python学习交流群:973783996满足你的需求,资料都已经上传群文件,可以自行下载! ''' respons ...
python基础知识整理-python爬虫基础知识点整理
首先爬虫是什么? 网络爬虫(又被称为网页蜘蛛,网络机器人,在FOAF社区中间,更经常的称为网页追逐者),是一种按照一定的规则,自动的抓取万维网信息的程序或者脚本. 根据我的经验,要学习Python爬虫 ...
掌握Python爬虫基础，仅需1小时！
随着互联网的发展,google.百度等搜索引擎让我们获取信息愈加方便.但需求总会不断涌现,纯粹地借助百度等收集信息是远远不够的,因此编写爬虫爬取信息的重要性就越发凸显. 比如有人为了炒股,专门爬取了多 ...
结构化数据丨Python爬虫基础入门系列(7)
提示:文末有福利!最新Python爬虫资料/学习指南>>戳我直达文章目录前言 JSON 1. json.loads() 2. json.dumps() 3. json.dump() 4 ...
Python爬虫基础-如何获取网页源代码
Python爬虫基础-如何获取网页源代码网络爬虫(Web Crawler),又称网页蜘蛛(Web Spider),是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本.爬虫程序根据一组特定的规则 ...

暑假学习 Python爬虫基础（4）

暑假学习 Python爬虫基础（4）相关推荐

最新文章

热门文章