爬取今日头条街拍美女图片

爬取今日头条美女图片需要分析Ajax请求

首先打开网址：https://www.toutiao.com/search/?keyword=%E8%A1%97%E6%8B%8D

返回的数据以及每个data展开后为下图：

图中标出的是详情页的url，然后进入详情页：

详细代码：

import json
import os
import re
from urllib.parse import urlencodefrom bs4 import BeautifulSoup
from requests.exceptions import RequestExceptionimport requestsdef get_page_index(offest, keyword):data = {'offset': offest,'format': 'json','keyword': keyword,'autoload': 'true','count': '20','cur_tab': 1}url = 'https://www.toutiao.com/search_content/?' + urlencode(data)try:response = requests.get(url)if response.status_code == 200:return response.textreturn Noneexcept RequestException:print("请求索引页出错")return Nonedef parse_page_index(html):data = json.loads(html)if data and 'data' in data.keys():for item in data.get('data'):yield item.get('article_url')def get_page_detail(url):try:response = requests.get(url)if response.status_code == 200:return response.textreturn Noneexcept RequestException:print("请求详情页出错", url)return Nonedef parse_page_detail(html, url):soup = BeautifulSoup(html, 'lxml')title = soup.select('title')[0].get_text()# print("hah"+title)images_pattern = re.compile('gallery: (.*?),\n', re.S)# print(images_pattern)result = re.search(images_pattern, html)if result:# print("haha"+result.group(1))data = json.loads(result.group(1))if data and 'sub_images' in data.keys():sub_images = data.get('sub_images')# print(sub_images)images = [item.get('url') for item in sub_images]if images:return {'title': title,'url': url,'images': images}else:passdef main():html = get_page_index(0, '街拍')for url in parse_page_index(html):html = get_page_detail(url)if html:result = parse_page_detail(html, url)if result is not None:for url in result.get('images'):print(url)pic=requests.get(url)pic_cun='F:\images\\'+str(url)[-8:-1]+'.jpg'fp = open(pic_cun, 'wb')  # 以二进制写入模式新建一个文件fp.write(pic.content)  # 把图片写入文件fp.close()if __name__ == '__main__':main()

爬取今日头条街拍美女图片相关推荐

利用Ajax爬取今日头条头像，街拍图片。关于崔庆才python爬虫爬取今日头条街拍内容遇到的问题的解决办法。
我也是初学爬虫,在看到崔庆才大佬的爬虫实战:爬取今日头条街拍美图时,发现有些内容过于陈旧运行程序时已经报错,网页的源代码早已不一样了.以下是我遇到的一些问题. 1.用开发者选项筛选Ajax文件时预览看 ...
爬取今日头条街拍图片
** *爬取今日头条街拍图片 * ** # coding=utf-8 import os import re import time from multiprocessing.pool import ...
python爬虫今日头条_python爬虫—分析Ajax请求对json文件爬取今日头条街拍美图
python爬虫-分析Ajax请求对json文件爬取今日头条街拍美图前言本次抓取目标是今日头条的街拍美图,爬取完成之后,将每组图片下载到本地并保存到不同文件夹下.下面通过抓取今日头条街拍美图讲解一 ...
python爬取今日头条街拍,Python3今日头条街拍爬虫
学习了大才哥的在线视频教程,特来这里总结分享一下. 不同于上一篇糗事百科的爬虫,这里爬取今日头条街拍需要分析ajax请求得来的数据. 首先这里是爬取的起始页可以看到当我们往下拉滚动条的时候,新数据是 ...
[Python3网络爬虫开发实战] --分析Ajax爬取今日头条街拍美图
[Python3网络爬虫开发实战] --分析Ajax爬取今日头条街拍美图学习笔记--爬取今日头条街拍美图准备工作抓取分析实战演练学习笔记–爬取今日头条街拍美图尝试通过分析Ajax请求来抓取 ...
Scrapy 爬取今日头条街拍图片
scrapy 爬取今日头条图片保存至本地之前用 requests 爬取过今日头条街拍的图片,当时只是爬取每篇文章的缩略图,今天尝试用 scrapy 来大规模爬取街拍详细图片. 分析页面今日头条的内 ...
爬取今日头条街拍美图
相关背景: 本篇文章是基于爬虫实践课程–分析Ajax请求并抓取今日头条街拍美图其实我最开始也只想在CSDN上面找一篇文章看看结果都是分析没有实操,没办法最后只能自己写了,本篇文章里面的问题也是我遇到 ...
python爬取今日头条_Python3网络爬虫实战-36、分析Ajax爬取今日头条街拍美图
本节我们以今日头条为例来尝试通过分析 Ajax 请求来抓取网页数据的方法,我们这次要抓取的目标是今日头条的街拍美图,抓取完成之后将每组图片分文件夹下载到本地保存下来. 1. 准备工作在本节开始之前请 ...
Python爬虫：爬取今日头条“街拍”图片（修改版）
前言在参考<Python3网络爬虫开发实战>学习爬虫时,练习项目中使用 requests ajax 爬取今日头条的"街拍"图片,发现书上的源代码有些已经不适合现在了, ...

爬取今日头条街拍美女图片

爬取今日头条街拍美女图片相关推荐

最新文章

热门文章