获得”我要个性网“的微信头像

更新：使用了java线程池，加速爬去过程，代码连接。因为该例子中爬各个网页间互不影响，也是将程序改为并行的

--2019.12.10

===============================================

不重在爬虫，而在学习过程

# -*- coding:utf-8 -*-
import urllib2, urllib, time
from bs4 import BeautifulSoup
import sys, os
reload(sys)
sys.setdefaultencoding('utf-8') #设置输出格式def crawl(url, website = ""):img_dir = "我要个性网"if os.path.isdir(img_dir) == False:os.mkdir(img_dir)#加头部信息，模拟浏览器headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/64.0.3282.119 Chrome/64.0.3282.119 Safari/537.36'}req = urllib2.Request(url, headers=headers) #创建对象page = urllib2.urlopen(req, timeout=20) #设置超时，防止URL不可访问或相应时间慢contents = page.read() #获取源码， readline获取一行#print contentssoup = BeautifulSoup(contents, 'html.parser')alinks = soup.find_all('a', {'class':'imgTitle'})global recordfor alink in alinks:# if record < 655: #断或卡后再连接设置参数#     record += 1#     continuedirpath = img_dir + '/' + str(record).zfill(3) + '_' + alink.textprint dirpathif(alink.text.__contains__('/')):deal_error(dirpath + '\n')dirpath = img_dir + '/' + str(record).zfill(3) + '_' + alink.text.replace('/', 'or')if os.path.isdir(dirpath) == False:os.mkdir(dirpath)suburl = website + alink.get('href')#print suburlsubreq = urllib2.Request(suburl, headers=headers)subpage = urllib2.urlopen(subreq, timeout=20)subcontents = subpage.read()# if record == 1:#     print subcontentssubsoup = BeautifulSoup(subcontents, 'html.parser')imgs = subsoup.find_all('img', {'class':'lazy'})cur = 0for img in imgs:cur += 1link = img.get('src')#print linkfilename = dirpath + '/%02d.jpg'%curprint filenametry:urllib.urlretrieve(link, filename) #下载并保存到images文件夹except:deal_error(filename + "\n" + link + "\n")record += 1def deal_error(string):fout = open("log_error.txt", "at")fout.write(string)fout.close()record = 1
url = 'http://www.woyaogexing.com/touxiang/weixin/index.html'
website = 'http://www.woyaogexing.com'
crawl(url, website)
pageNum = 1
while (True): pageNum += 1print "请求第==================================================%d===================页" % pageNumurl = 'http://www.woyaogexing.com/touxiang/weixin/index_%d.html' % pageNumcrawl(url, website)#遇到的问题 Connection reset by peer
#Temporary failure in name resolution
#最终会404 NOT FOUND异常终止程序

python3代码

# -*- coding:utf-8 -*-
from bs4 import BeautifulSoup
import requests, os, threading, reheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)\AppleWebKit/537.36 (KHTML, like Gecko)\Chrome/69.0.3497.100 Safari/537.36'}def loge(msg):with open('error_log.txt', 'at+') as fout:try:fout.write(msg)except:fout.write('Warning: 编码错误')def save_img(url, path):with open(path, 'wb') as fout:response = requests.get(url, headers).contentfout.write(response)def spider(url, website=''):path = os.path.join(os.getcwd(), 'images')if not os.path.exists(path):os.mkdir(path)response = requests.get(url, headers=headers).contentsoup = BeautifulSoup(response, 'html.parser')divs = soup.select('.txList')next_page = soup.find('div', {'class':'page'})for div in divs:try:title = re.sub('[\/:*?"<>|\n.]', '_', div.a.get('title'))dir_name = os.path.join(path, title)if not os.path.exists(dir_name):os.mkdir(dir_name)except:loge('Error: ' + str(div))continueresponse = requests.get(website + str(div.a.get('href'))).contentsoup = BeautifulSoup(response, 'html.parser')lis = soup.select('.tx-img')for li in lis:img_url = 'http:' + li.a.get('href')file_path = os.path.join(dir_name, img_url.split('/')[-1])thread = threading.Thread(target=save_img, args=(img_url, file_path))thread.start()print(os.getpid(), url)if next_page:next_url = website + str(next_page.findAll('a')[-1].get('href'))thread = threading.Thread(target=spider, args=(next_url, website))thread.start()def main():website = 'https://www.woyaogexing.com'url = 'https://www.woyaogexing.com/touxiang/weixin/'# index_40后网页结构变了spider(url, website)if __name__ == '__main__':main()

获得”我要个性网“的微信头像相关推荐

python爬虫我要个性网，获取头像
python爬虫学习提前声明:请勿他用,仅限个人学习运用模块有 import requests import re import os 较为常规,适合网络小白.lxml和bs4也是基础.长话短说. ...
Python分析并爬取《我要个性网》的头像
一步一步分析并爬取我要个性网上的头像 1.发现页面规律 ~~~~~~~~ 网站:https://www.woyaogexing.com/ ~~~~~~~~ 打开该网站如图所 ...
利用python爬取qq个性网图片
利用python爬取qq个性网图片网站头像布局大同小异,稍改代码即可爬取想要的头像. 不多bb,上代码. import requests from parsel import Selector im ...
不用@微信官网了，用python给自己的微信头像加个小国旗
国旗LOGO(png透明格式): 微信头像合成结果: import base64 import os import re from io import BytesIO from PIL import ...
微信头像服务器更新时间,一个人的爱情怎么样，看ta换头像的频率就知道了
单身的时候我们总换不同的头像. 不同的个性签名来代替自己莫须有的情绪. 有人说"频繁换头像的人,要么就是无聊, 要么就是想通过头像来告诫别人自己想要新的开始" 又或者说,想吸 ...
怎样把照片中的头像扶正_这五种微信头像的女人最受欢迎，有你吗？
文 | 白小姐在微信社交中,当一个人和你还比较陌生时,头像,就成了你们对彼此的第一印象. 仔细想想,和人互加微信时,首先看到的就是对方的头像. 有人会因为对方用了一个合他口味的头像,而备受好感.也有 ...
python代码图片头像_Python帮你微信头像任意添加装饰别再@微信官方了
@微信官方昨天朋友圈刷爆了@微信官方的梗,从起初的为头像添加国旗,到最后的各种Book思议的需求-而我呢?@了辣么辣么多的奥特曼,结果还是加班到12点多-最后想想,人还是得靠自己吧,@我自己吧- 昨 ...
python微信加人_Python帮你微信头像任意添加装饰，别再@微信官方了_编程语言_python考试视频_python教程_课课家...
@微信官方昨天朋友圈刷爆了@微信官方的梗,从起初的为头像添加国旗,到最后的各种Book思议的需求-而我呢?@了辣么辣么多的奥特曼,结果还是加班到12点多-最后想想,人还是得靠自己吧,@我自己吧- 昨 ...
求求你给你的微信头像戴个圣诞帽吧！
作者 | 刘早起责编 | 张文头图 | CSDN 下载自视觉中国来源 | 早起Python(ID:zaoqi-python) 圣诞节快到了,每年一到圣诞节就会有很多人的头像上多了一顶小红帽. ...

获得”我要个性网“的微信头像

获得”我要个性网“的微信头像相关推荐

最新文章

热门文章