python 爬虫小姐姐

声明：本代码仅做学习python爬虫研究之用，请勿用于不正当用途。

运行：全局搜索 ‘F:/python_study/python/Pictures/’ 替换自己的文件目录，然后直接运行即可

# 目标网址：https://www.xiurenb.cc
# https://blog.csdn.net/Primordial_Shen/article/details/126292214
# 唐安琪
# 周于希
# 朱可儿
# 杨晨晨
# 芝芝 # 徐莉芝
# 林星阑
# 利世
# 鱼子酱
# 就是阿朱啊
# 王馨瑶
# 陆萱萱
# 熊小诺
# 王雨纯
# 梦心玥
# 豆瓣酱
# 江真真
# 小肥莹
# 安然
# 是小逗逗
# 小果冻儿
# 露露
# 韩好甜
# 吴雪瑶
# 萌奈子
# 小波多
# 沈佳熹
# 糯美子
# 梦乃
# 白甜
# 夏沫沫
# 果儿
# 冯木木
# 尤妮丝
# 小海臀
# 阿姣
# 波巧酱
# 周妍希 # 导入库
import time, os, requests
from lxml.html import etree
from urllib import parse# 定义请求头headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'
}# 格式化列表
img_list = []
url_list = []
page_list = []# 编码输入数据
human_unencode = input('输入写真姓名（回车确认[会下载该小姐姐全部的图片]，如输入：杨晨晨）：')
human_encode = parse.quote(human_unencode)# 编码后索引url
url_human = 'https://www.xiurenb.cc/plus/search/index.asp?keyword=' + str(human_encode) + '&searchtype=title'# 获取指定人物写真集列表页数
res_first = requests.get(url=url_human, headers=headers)
tree_first = etree.HTML(res_first.text)
Num_first = len(tree_first.xpath('/html/body/div[3]/div[1]/div/div/ul/div[3]/div/div[2]/a'))
print(f'{human_unencode}，总页数:{Num_first}')# 获取指定页数的每个写真集的url并写入列表
# i = input('Enter the PageNumber:')
# print(f'Getting the page-{i}...')
# print(url_human + '&p=' + str(i))
for index in range(int(Num_first)):# 分页从1开始index = index + 1# 第index页前的已经下载if index < 4:continueprint(f'{human_unencode}，开始下载第：{index}页数据......')res_human = requests.get(url_human + '&p=' + str(index))tree_human = etree.HTML(res_human.text)jihe_human = tree_human.xpath('/html/body/div[3]/div[1]/div/div/ul/div[3]/div/div[1]/div/div[1]/h2/a/@href')for page in jihe_human:page_list.append(page)# time.sleep(1)# 获取每个写真集的全部图片for Page_Num in page_list:url = 'https://www.xiurenb.cc' + str(Page_Num)Num_res = requests.get(url=url, headers=headers)Num_tree = etree.HTML(Num_res.text)Num = len(Num_tree.xpath('/html/body/div[3]/div/div/div[4]/div/div/a'))url_list.append(url)for i in range(1, int(Num) - 2):url_other = url[:-5] + '_' + str(i) + '.html'url_list.append(url_other)# 获取所有图片urlfor url_img in url_list:res = requests.get(url=url_img, headers=headers)tree = etree.HTML(res.text)img_src = tree.xpath('/html/body/div[3]/div/div/div[5]/p/img/@src')for img in img_src:img_list.append(img)time.sleep(0.1)# 创建保存目录res = requests.get(url=url_list[0], headers=headers)res.encoding = 'utf-8'tree = etree.HTML(res.text)path_name = tree.xpath('/html/body/div[3]/div/div/div[1]/h1//text()')[0][11:]print(path_name)if not os.path.exists(f'F:/python_study/python/Pictures/{human_unencode}'):os.mkdir(f'F:/python_study/python/Pictures/{human_unencode}')the_path_name = f'F:/python_study/python/Pictures/{human_unencode}/' + path_name# 期数已经存在，跳过if not os.path.exists(the_path_name):os.mkdir(the_path_name)# 保存图片数据num = 0for j in img_list:img_url = 'https://www.xiurenb.cc' + jimg_data = requests.get(url=img_url, headers=headers).contentimg_name = img_url.split('/')[-1]finish_num = str(num) + '/' + str(len(img_list))with open(f'F:/python_study/python/Pictures/{human_unencode}/' + path_name + '/' + img_name, 'wb') as f:print(f'正在下载图片:{img_name}/{finish_num}')f.write(img_data)f.close()num += 1time.sleep(0.1)# 再次格式化列表img_list = []url_list = []else:print('已存在的期数，跳过>>>')# 再次格式化列表img_list = []url_list = []# 再次格式化列表下一页page_list = []
# 输出结束提示
print(f'{human_unencode}，全部下载完成!')

python 爬虫小姐姐相关推荐

python色卡识别_用Python帮小姐姐选口红，人人都是李佳琦
原标题:用Python帮小姐姐选口红,人人都是李佳琦对于李佳琦,想必知道他的女生要远远多于男生,李佳琦最早由于直播向广大的网友们推荐口红,逐渐走红网络,被大家称作"口红一哥".不 ...
9139 位艺人在 Python 面前不值一提 # Python 爬虫小课 5-9
本篇博客的最终目标是爬取世界上 9139 位艺人的身高.体重.生日.血型,当然有些数据目标网站没有提供,不在做过多的扩展. 爬虫小课系列文章导读链接第一篇:Python 爬虫小课 1-9 宝妈程序媛 ...
用python将图片变为油画_Python也能成为毕加索？我用Python给小姐姐画了幅油画
原标题:Python也能成为毕加索?我用Python给小姐姐画了幅油画小编的舍友最近交了一个女朋友,是念艺术系的,擅长画画!长的好看又漂亮,舍友经常在我耳边吹嘘,女朋友画的油画多么漂亮. 哎...经 ...
某徒步旅游网站python爬虫小练习
yxk周边游网站python爬虫小练习(跨页面) 代码很简单,关键一个是 encoding="utf_8_sig" , 否则乱码,注意writerow()处理list import ...
虎嗅 24 小时点赞器，一个案例附带一个爬虫技巧，Python 爬虫小课 7-9
很多平台都有点赞功能,今天提供的这个思路可用于很多平台,希望可以掌握该技巧,实现你自己的点赞器.本案例目标为虎嗅 24 小时频道点赞. 爬虫小课系列文章导读链接第一篇:Python 爬虫小课 1-9 ...
25 岁以上的程序员，认识不了几个中药材的。Python 爬虫小课 9-9
中药材在橡皮擦眼中,只有马钱子.决明子.苍耳子.还有莲子.黄药子.苦豆子.川楝子.我要面子,这是少时从<本草纲目>学来的. 其余的也就知道个枸杞.三七.藿香正气水.板蓝根了,为了摆脱不认 ...
学会这 10000 个段子，成为 IT 职场幽默达人。Python 爬虫小课 8-9
现代职场达人,应该做到有情.有趣.有用.有品.好了,去掉 "有" 字你就成了.那如何成为职场幽默达人呢,咱需要一定的素材也就是段子,多看段子才能多说段子,并且还能说高级段子. 点 ...
小吃搜搜乐，弄点小吃数据放在本地、Python 爬虫小课 6-9
最近查询一下河北有哪些小吃,找来找去不是很方便,发现百度有个接口,顺手牵一下数据,通过各省份的枚举,把数据都整理到 Excel 中,以后在查询就比较方便了. 下图为最终抓取数据格式,都是好吃的. 爬虫 ...
1s 爬取到 1131 只数码兽，送给《数码宝贝：最后的进化》＞ Python 爬虫小课 4-9
童年回忆<数码宝贝:最后的进化>10 月 30 日在中国内地上映.所有和我们同龄的人都仍然记得数码宝贝,并且印象最深的还是第一部, 那永远的第一部! 本系列文章导读链接第一篇:Pytho ...

python 爬虫小姐姐

python 爬虫小姐姐相关推荐

最新文章

热门文章

python 爬虫 小姐姐

python 爬虫 小姐姐相关推荐

最新文章

热门文章

python 爬虫小姐姐

python 爬虫小姐姐相关推荐