【菜鸟学Python】爬取果壳问答

爬取流程

1.确定url

2.请求url

3.使用xpath处理数据

4.保存数据

import time
import json
import requests
from lxml import etreeclass GuoKe(object):def __init__(self):self.base_url = 'https://www.guokr.com/ask/hottest/'self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36","Referer": "https://www.guokr.com/"}def get_data(self, url, page, is_detail=False):if not is_detail:param = {"page": page}return requests.get(url, headers=self.headers, params=param).textreturn requests.get(url, headers=self.headers).textdef parse_text(self, text):dom = etree.HTML(text)return domdef detail_dom(self, dom):answer = dom.xpath('//div[@id="questionDesc"]/p/text()')[0]return answerdef parse(self, dom):nodes = dom.xpath('//div[@class="ask-list-detials"]')for node in nodes:item = {}item['title'] = node.xpath('./h2/a/text()')[0]detail_url = node.xpath('./h2/a/@href')[0]print(detail_url)detail_text = self.get_data(detail_url, True)detail_dom = etree.HTML(detail_text)answer = self.detail_dom(detail_dom)print(answer)item['answer'] = answerprint(item)yield itemdef save(self, f, item):f.write(json.dumps(item, ensure_ascii=False) + ',\n')def run(self):with open('guoke.json', 'w')as f:page = int(input('请输入页码：'))for i in range(page):text = self.get_data(self.base_url, i + 1)dom = self.parse_text(text)my_generator = self.parse(dom)while True:try:item = next(my_generator)self.save(f, item)time.sleep(0.4)except:breakprint(f'第{i + 1}页保存完成')if __name__ == '__main__':guoke = GuoKe()guoke.run()

转载于:https://www.cnblogs.com/liduo0413/p/11471088.html

【菜鸟学Python】爬取果壳问答相关推荐

【实战】scrapy 爬取果壳问答！
引言学爬虫的同学都知道,Scrapy是一个非常好用的框架,可以大大的简化我们编写代码的工作量.今天我们就从使用Scrapy爬取果壳问答. 需求分析爬取果壳问答中精彩回答的标题和答案. 知识点爬取 ...
爬虫练习（2）-- 使用正则匹配爬取果壳问答
分析从 web 的角度来看,网站架构分为前后端分离和前后端不分离,如果是前后端不分离的结构,我们就需要从响应中去匹配我们希望提取的数据.举个例子就是果壳网的热门问答. 获取整个网页去正则匹配之前 ...
跟我一起学-Python爬取(酷我)
来来来,有兴趣的可以看看一进入酷我音乐官网 http://www.kuwo.cn/ 二按下F12 跳出一个框框三在搜索栏输入先要搜索的内容,以薛之谦为例,点击搜索四获取音乐地址看上图我 ...
python爬取股票大单历史记录_python爬取股票实时数据,python爬虫与股票分析
内容导航: Q1:怎么学python爬取财经信息本程序使用Python 2.7.6编写,扩展了Python自带的HTMLParser,自动根据预设的股票代码列表,从Yahoo Finance抓取列表 ...
学python能赚什么外卖-python爬取外卖
广告关闭腾讯云双11爆品提前享,精选热门产品助力上云,云服务器首年88元起,买的越多返的越多,最高满返5000元! time.sleep(1)d:pythonvenvscriptspython.ex ...
python 爬取菜鸟教程python100题，百度贴吧图片反爬虫下载，批量下载
每天一点点,记录学习 python 爬取菜鸟教程python100题近期爬虫项目,看完请点赞哦: 1:python 爬取菜鸟教程python100题,百度贴吧图片反爬虫下载,批量下载 2:pytho ...
Python爬虫菜鸟入门，爬取豆瓣top250电影（自己学习，如有侵权，请联系我删除）
Python爬虫菜鸟入门,爬取豆瓣top250电影 (自己学习,如有侵权,请联系我删除) import requests from bs4 import BeautifulSoup import ti ...
用Python爬取好奇心日报
用Python爬取好奇心日报本项目最后更新于2018-7-24,可能会因为没有更新而失效.如已失效或需要修正,请联系我! 本项目已授权微信公众号"菜鸟学Python"发表文章爬 ...
python爬取岗位数据并分析_区块链岗位薪资高，Python爬取300个区块链岗位分析，龙虎榜出炉...
原创: 菜鸟哥菜鸟学Python 最近区块链技术再次被大家热议,既然区块链受到如此高的关注,我们就不妨去采集数据分析看看,目前所有与区块链相关的招聘信息吧. 1数据的爬取首先是对于数据的爬取,由于 ...

【菜鸟学Python】爬取果壳问答

【菜鸟学Python】爬取果壳问答相关推荐

最新文章

热门文章