python 爬虫（一） requests+BeautifulSoup 爬取简单网页图片代码示例

最近学习了Python，借助各个大神的文章，自己写了以下代码，来爬取网页图片，希望可以帮助到大家。
工具是 idea


#coding=utf-8
import requests
from bs4 import BeautifulSoup
import os
import sys
'''
#安卓端需要此语句
reload(sys)
sys.setdefaultencoding('utf-8')
'''if(os.name == 'nt'):print(u'你正在使用win平台')
else:print(u'你正在使用linux平台')header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 UBrowser/6.1.2107.204 Safari/537.36'}
#http请求头
all_url = 'http://www.win4000.com/zt/xinggan.html'
start_html = requests.get(all_url,headers = header)#保存地址 手动创建文件夹
path = 'D:/练习/'#找寻最大页数
soup = BeautifulSoup(start_html.text,"html.parser")
page = soup.find_all('a',class_='num',rel='nofollow')
max_page = int(page[2].text) +1same_url = 'http://www.win4000.com/zt/'
for n in range(1,int(max_page)+1):ul = same_url+'xinggan_'+str(n)+'.html'print('ul:' +ul)start_html = requests.get(ul, headers=header)print(start_html)soup = BeautifulSoup(start_html.text,"html.parser")all_a = soup.find('div',class_='tab_tj').find_all('a',target='_blank')for a in all_a:title = a.get_text() #提取文本if(title != ''):print("准备扒取："+title)#win不能创建带？的目录if(os.path.exists(path+title.strip().replace('?',''))):#print('目录已存在')flag=1else:os.makedirs(path+title.strip().replace('?',''))flag=0os.chdir(path + title.strip().replace('?',''))href = a['href']print('这是href:+'+href)html = requests.get(href,headers = header)mess = BeautifulSoup(html.text,"html.parser")pic_max = mess.select('.ptitle em')[0].text#pic_max = pic_max[-2].text #最大页数print(pic_max)if(flag == 1 and len(os.listdir(path+title.strip().replace('?',''))) >= int(pic_max)):print('已经保存完毕，跳过')continuefor num in range(1,int(pic_max)+1):pic = href[:-5]+'_'+str(num)+'.'+href[-4:]print(pic)html = requests.get(pic,headers = header)html.encoding = 'utf8'mess = BeautifulSoup(html.text,"html.parser")pic_url = mess.find('img',alt = title)pic_url2 = mess.select('.pic-large')[0]print(pic_url2)html = requests.get(pic_url2['src'],headers = header)file_name = pic_url2['src'].split(r'/')[-1]f = open(file_name,'wb')f.write(html.content)f.close()print('完成')print('第',n,'页完成')

python 爬虫（一） requests+BeautifulSoup 爬取简单网页图片代码示例相关推荐

python 爬虫（一） requests+BeautifulSoup 爬取简单网页代码示例
以前搞偷偷摸摸的事,不对,是搞爬虫都是用urllib,不过真的是很麻烦,下面就使用requests + BeautifulSoup 爬爬简单的网页. 详细介绍都在代码中注释了,大家可以参阅. # -* ...
python爬取图片教程-推荐|Python 爬虫系列教程一爬取批量百度图片
Python 爬虫系列教程一爬取批量百度图片https://blog.csdn.net/qq_40774175/article/details/81273198# -*- coding: utf-8 ...
Python爬虫【二】爬取PC网页版“微博辟谣”账号内容(selenium同步单线程)
专题系列导引爬虫课题描述可见: Python爬虫[零]课题介绍 – 对"微博辟谣"账号的历史微博进行数据采集课题解决方法: 微博移动版爬虫 Python爬虫[一]爬取移 ...
Python爬虫【四】爬取PC网页版“微博辟谣”账号内容(selenium多线程异步处理多页面)
专题系列导引爬虫课题描述可见: Python爬虫[零]课题介绍 – 对"微博辟谣"账号的历史微博进行数据采集课题解决方法: 微博移动版爬虫 Python爬虫[一]爬取移 ...
Python爬虫【三】爬取PC网页版“微博辟谣”账号内容(selenium单页面内多线程爬取内容)
专题系列导引爬虫课题描述可见: Python爬虫[零]课题介绍 – 对"微博辟谣"账号的历史微博进行数据采集课题解决方法: 微博移动版爬虫 Python爬虫[一]爬取移 ...
Python爬虫：Selenium+ BeautifulSoup 爬取JS渲染的动态内容（雪球网新闻）
最近要有一个任务,要爬取https://xueqiu.com/#/cn 网页上的文章,作为后续自然语言处理的源数据. 爬取目标:下图中红色方框部分的文章内容.(需要点击每篇文章的链接才能获得文章内容) ...
python爬虫下载小说_python 爬取小说并下载的示例
代码 import requests import time from tqdm import tqdm from bs4 import BeautifulSoup """ ...
python爬虫——利用百度搜索引擎爬取所需图片
参考:python 爬取动态网页(百度图片) 说明:在上面这位博主的贴子的基础上做了一些改进,解决了有些URL无法访问导致的请求超时异常抛出致使程序退出的问题.话不多说,直接上代码. import r ...
03 Python爬虫之Requests网络爬取实战
目录实例1:京东商品页面的爬取实例2:亚马逊商品页面的爬取实例3:百度搜索关键字提交实例4:IP地址归属地的自动查询实例1:京东商品页面的爬取实例1:京东商品页面的爬取 https://i ...

python 爬虫（一） requests+BeautifulSoup 爬取简单网页图片代码示例

python 爬虫（一） requests+BeautifulSoup 爬取简单网页图片代码示例相关推荐

最新文章

热门文章