python爬虫完整代码下载页

由于上一个连接有些网址被和谐，所以这里贴出完整的代码
运行是修改path路径为你自己保存图片的位置。

# -*- coding: utf-8 -*-
# 作者: 废人一枚
# 出自:  北京
# 创建时间: 12:22
import requests
import os
import time
from lxml import etreerequests.adapters.DEFAULT_RETRIES = 5
s = requests.session()
s.keep_alive = Falseall_url = 'https://www.mzitu.com'
# Windows保存地址
path = 'D:/python/爬虫/test1/'
# 获取每一类的网址
same_url = 'https://www.mzitu.com/page/'  # 也可以指定其它的路径下的图片
# http请求头  User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36
Hostreferer = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36','Referer': 'http://www.mzitu.com'
}
# 此请求头Referer破解盗图链接
Picreferer = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36','Referer': 'https://i3.mmzztt.com'
}
# 对mzitu主页所有url发起请求，将返回的HTML数据保存，便于解析
index_html = requests.get(all_url, headers=Hostreferer)
#print(index_html.text)
# 获取最大页数
content = etree.HTML(index_html.text)
page_num = content.xpath("//div[@class='nav-links']/a/text()")
max_page = page_num[-2]
print("图片总共页码为："+max_page)
for n in range(1, int(max_page) + 1):# 拼接当前url eg：https://www.mzitu.com/page/1ul = same_url + str(n)print(ul)time.sleep(1)# 分别对当前类每一页第一层url发起请求，进入第n页two_html = requests.get(ul, headers=Hostreferer)#print(two_html.text)content = etree.HTML(two_html.text)urls  = content.xpath("//div[@class='postlist']/ul/li/a/@href")names = content.xpath("//div[@class='postlist']/ul/li/a/img/@alt")k = 0for title in names:print(title)print("@@@@@@@@@2")print(k)print(urls[k])url_jpg = urls[k]k = k + 1if k <=2:continueprint("准备抓取：" + str(title))# windows不能创建带？的目录，添加判断逻辑if (os.path.exists(path + title.strip().replace('?', ''))):# print('目录已存在')flag = 1else:os.makedirs(path + title.strip().replace('?', ''))flag = 0# 切换到上一步创建的目录os.chdir(path + title.strip().replace('?', ''))# 每个图片对应的urlprint(url_jpg)three_html = requests.get(url_jpg, headers=Hostreferer)#获取当前页图片的张数content = etree.HTML(three_html.text)page_num = content.xpath("//div[@class='pagenavi']/a/span/text()")print(page_num[-2])pic_max = page_num[-2]# for i in page_num:#     print(i)print("总共找到有" + pic_max + "张图片")# 遍历每张图片的urlfor num in range(1, int(pic_max) + 1):#time.sleep(2)# 拼接每张图片的urlpic = url_jpg + '/' + str(num)print(pic)#发起请求try:html = requests.get(pic, headers=Hostreferer, timeout=15)except requests.exceptions.RequestException as e:print(e)continuecontent = etree.HTML(html.text)urls_jpg = content.xpath("//div[@class='main-image']/p/a/img/@src")time.sleep(2)print(urls_jpg[0])try:html = requests.get(urls_jpg[0], headers=Picreferer, timeout=15)except requests.exceptions.RequestException as e:print(e)continuefile_name = urls_jpg[0][-9:]# 保存图片with open(file_name, 'wb') as f:f.write(html.content)f.flush()f.close()
print('下载全部完成')

麻烦喜欢的话关注一下，谢谢！！！

python爬虫完整代码下载页相关推荐

Python爬虫完整代码拿走不谢
对于新手做Python爬虫来说是有点难处的,前期练习的时候可以直接套用模板,这样省时省力还很方便. 使用Python爬取某网站的相关数据,并保存到同目录下Excel. 直接上代码: import re ...
Python开发爬虫完整代码解析
Python开发爬虫完整代码解析移除python 三天时间,总算开发完了.说道爬虫,我觉得有几个东西需要特别注意,一个是队列,告诉程序,有哪些url要爬,第二个就是爬页面,肯定有元素缺失的,这个究 ...
python爬虫+ffmpeg批量下载ts文件，解密合并成mp4
标题 python爬虫+ffmpeg批量下载ts文件,解密合并成mp4 文章目录标题前言一.分析目标二.寻找url规律三.写代码总结前言 (第一次写博客,写的不好请见谅哈~~) 目标是大 ...
Python 爬虫 m3u8的下载及AES解密
python爬虫 m3u8的下载及AES加密的解密前言 2023.1.23更新线程池版完整代码异步协程版前言这里与hxdm分享一篇关于m3u8视频流的爬取下载合并成mp4视频的方法,并且支 ...
python爬虫,爬取下载图片
python爬虫,爬取下载图片分别引入以下三个包 from urllib.request import urlopen from bs4 import BeautifulSoup import re ...
mac用python爬虫下载图片_使用Python爬虫实现自动下载图片
python爬虫支持模块多.代码简洁.开发效率高 ,是我们进行网络爬虫可以选取的好工具.对于一个个的爬取下载,势必会消耗我们大量的时间,使用Python爬虫就可以解决这个问题,即可以实现自动下载.本文 ...
python爬虫教程下载-Python爬虫视频教程全集下载
原标题:Python爬虫视频教程全集下载 Python作为一门高级编程语言,在编程中应用得非常广泛.随着人工智能的发展,python人才的需求更大.当然,这也吸引了很多同学选择自学Python爬虫.P ...
python爬虫实现批量下载百度图片
今天和小伙伴们合作一个小项目,需要用到景点图片作为数据源,在百度上搜索了一些图片,感觉一个一个手动保存太过麻烦,于是想到用爬虫来下载图片. 本次代码用到了下列一些python模块,需要预先安装Beau ...
python爬虫实现音乐下载
python爬虫实现音乐下载音乐下载功能模块 # !/usr/bin/env python # -*- coding:UTF-8 -*- # # @Version : 1.0 # @Time : 2 ...

python爬虫完整代码下载页

python爬虫完整代码下载页相关推荐

最新文章

热门文章