python爬取百度美女图片

首先用chrome浏览器打开百度图片官网，抓包发现url

https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord=美女&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=&copyright=&word=美女&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&pn=90&rn=30
queryWord和word是关键字
force为30的倍数

截图如下

1.单线程爬取

# -*- encoding:utf-8 -*-"""
@python: 3.7
@Author: xiaobai_IT_learn
@Time: 2019-10-31 10:00
"""
import os
import re
import time
import requestsIMAGE_PATH = './baidu_image'class BaiduImageSpider(object):def __init__(self, key_word):self.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) ''Chrome/77.0.3865.120 Safari/537.36'}self.key_word = key_wordself.num = 1self._file()def _file(self):"""创建文件夹:return:"""if not os.path.exists(IMAGE_PATH):os.mkdir(IMAGE_PATH)def get_url_list(self):"""url列表:return:"""url_list = []for i in range(30):url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result' \'&queryWord={}&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=&copyright=&word=美女' \'&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn={}&rn=30'\.format(self.key_word, i*30)url_list.append(url)return url_listdef spider_baidu_image(self, url):"""爬虫:param url::return:"""html = requests.get(url, headers=self.headers)html_str = html.content.decode()image_list = re.findall(r"\"thumbURL\":\"(.*?)\",\"middleURL\"", html_str)print(len(image_list))for image_url in image_list:try:content = requests.get(image_url, headers=self.headers).contentexcept Exception as e:print(e)continuefile_path = IMAGE_PATH + '/' + str(self.num) + '.jpg'with open(file_path, 'wb') as f:f.write(content)self.num += 1def run(self):url_list = self.get_url_list()for url in url_list:self.spider_baidu_image(url)if __name__ == '__main__':key_word = input('输入要查询的关键字：')start_time = time.time()spider_baidu_image = BaiduImageSpider(key_word)spider_baidu_image.run()print(time.time() - start_time)

单线程耗时：54.337618589401245，总共629张图片

# -*- encoding:utf-8 -*-"""
@python: 3.7
@Author: xiaobai_IT_learn
@Time: 2019-10-31 10:00
"""
import os
import re
import threading
import time
from queue import Queue
import requestsIMAGE_PATH = './baidu_image_threading'class BaiduImageSpider(object):def __init__(self, key_word):self.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) ''Chrome/77.0.3865.120 Safari/537.36'}self.key_word = key_wordself.url_queue = Queue()self.num = 1self._file()def _file(self):"""创建文件夹:return:"""if not os.path.exists(IMAGE_PATH):os.mkdir(IMAGE_PATH)def get_url_list(self):"""url列表:return:"""for i in range(30):url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result' \'&queryWord={}&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=&copyright=&word={}' \'&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn={}&rn=30'\.format(self.key_word, self.key_word, i*30)self.url_queue.put(url)def spider_baidu_image(self):"""爬虫:param url::return:"""while True:url = self.url_queue.get()html = requests.get(url, headers=self.headers)html_str = html.content.decode()image_list = re.findall(r"\"thumbURL\":\"(.*?)\",\"middleURL\"", html_str)for image_url in image_list:try:content = requests.get(image_url, headers=self.headers).contentexcept Exception as e:print(e)continuefile_path = IMAGE_PATH + '/' + str(self.num) + '.jpg'print(self.num)with open(file_path, 'wb') as f:f.write(content)self.num += 1self.url_queue.task_done()def run(self):thread_list = []t_url = threading.Thread(target=self.get_url_list, daemon=True)thread_list.append(t_url)for i in range(3):t_spider = threading.Thread(target=self.spider_baidu_image, daemon=True)thread_list.append(t_spider)for t in thread_list:t.start()self.url_queue.join()if __name__ == '__main__':key_word = input('输入要查询的关键字：')start_time = time.time()spider_baidu_image = BaiduImageSpider(key_word)spider_baidu_image.run()print(time.time() - start_time)

多线程爬取耗时：16.234709978103638，总共629张图片

python爬取百度美女图片相关推荐

Python爬取百度壁纸图片
Python爬取百度壁纸图片 #! /usr/bin/python -- coding: utf-8 -- @Author : declan @Time : 2020/05/31 16:29 @Fil ...
手把手带你爬取百度美女图片，Python练手项目！
本文纯技术角度出发,教你如何用Python爬虫获取百度美女图片--技术无罪. 目标站点百度图片使用关键字搜索小姐姐私房照 https://image.baidu.com/ 开发环境系统:Wind ...
python爬取网站美女图片
今天周五,项目刚刚上线完,有些时间,闲着无聊,继续复习爬虫,这次打算爬取网站的美女图片.得先找到目标,然后目标网站还不会反爬虫,因为自己只是小白,好了开始. 寻找目标,发现了目标,哈哈 http:// ...
利用Python爬取网页美女图片，哇太多了，我U盘装满了！
最近几天,研究了一下一直很好奇的爬虫算法.这里写一下最近几天的点点心得.下面进入正文: 你可能需要的工作环境: Python 3.9官网下载我们这里以sogou作为爬取的对象. 首先我们进入搜狗图片 ...
python爬取欧美美女图片---xpath方法
这里爬取的是http://sc.chinaz.com/tag_tupian/OuMeiMeiNv.html网站献上欧美美女!!!! from lxml import etree import url ...
python爬取百度美女壁纸
给代码给代码好东西必须分享大家一起享受! # !/usr/bin/env python # -*- coding:utf-8 -*- import requests import json ...
使用Python爬虫爬取网络美女图片
代码地址如下: http://www.demodashi.com/demo/13500.html 准备工作安装python3.6 略安装requests库(用于请求静态页面) pip instal ...
python爬取百度贴吧图片库_python爬取百度贴吧的图片2
今天看了一下beautifulsoup库的用法,把昨天的python爬取百度贴吧的图片1的代码更新成使用beautifulsoup库的函数来实现.用的还是不太熟练,但是感觉比正则表达式写起来容易了一些 ...
python爬取帖吧图片实验报告,Python爬取百度贴吧图片
原标题:Python爬取百度贴吧图片作者:MTbaby 来源:http://blog.csdn.net/mtbaby/article/details/70209729 描述:用Python爬去百度贴 ...

python爬取百度美女图片

python爬取百度美女图片相关推荐

最新文章

热门文章