Python 爬取链家新房并生成物业类型对比图

1，爬取网站新房内容

from bs4 import BeautifulSoup
import requests
import time
import pandas as pdheaders = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; \Windows NT 6.1; WOW64;Trident/6.0; SLCC2;.NET CLR 2.0.50727; .NET CLR 3.5.30729;\.NET CLR 3.0.30729; InfoPath.3; .NET4.0C; .NET4.0E)','Accept': 'image/webp,image/*,*/*;q=0.8','Accept-Encoding': 'gzip, deflate','Referer': 'http://www.baidu.com/link?\url=_andhfsjjjKRgEWkj7i9cFmYYGsisrnm2A-TN3XZDQXxvGsM9k9ZZSnikW2Yds4s&amp;amp;wd=&amp\;amp;eqid=c3435a7d00006bd600000003582bfd1f','Connection': 'keep-alive'}
page = ('pg')def generate_cityurl(user_in_city,region):  # 生成urlcityurl = 'https://' + user_in_city + '.lianjia.com/loupan/' + region + '/#' + regionreturn cityurl# return demjson.encode(res)"""d = json.loads(res.read().decode()).get('data')if d is None:print("城市首页加载完成")return """def areainfo(url):page = ('pg')for i in range(1, 21):  # 获取1-20页的数据if i == 1:i = str(i)a = (url + page + i + '/')r = requests.get(url=a, headers=headers)print(a)htmlinfo = r.contentelse:i = str(i)a = (url + page + i + '/')print(a)r = requests.get(url=a, headers=headers)html2 = r.contenthtmlinfo = htmlinfo + html2time.sleep(0.5)return htmlinfohlist = []def listinfo(listhtml):areasoup = BeautifulSoup(listhtml, 'html.parser')ljhouse = areasoup.find_all('div', attrs={'class': 'resblock-desc-wrapper'})for house in ljhouse:loupantitle = house.find("div", attrs={"class": "resblock-name"})loupanname = loupantitle.a.get_text()loupantag = loupantitle.find_all("span")wuye = loupantag[0].get_text()xiaoshouzhuangtai = loupantag[1].get_text()location = house.find("div", attrs={"class": "resblock-location"}).get_text()jishi = house.find("a", attrs={"class": "resblock-room"}).get_text()area = house.find("div", attrs={"class": "resblock-area"}).get_text()tag = house.find("div", attrs={"class": "resblock-tag"}).get_text()jiage = house.find("div", attrs={"class": "resblock-price"})price = jiage.find("div", attrs={"class": "main-price"}).get_text()total = jiage.find("div", attrs={"class": "second"})totalprice = "暂无"if total is not None:totalprice = total.get_text()h = {'title': loupanname, 'wuye': wuye, 'xiaoshouzhuangtai': xiaoshouzhuangtai, 'location': location.replace("\n", ""),'jishi': jishi.replace("\n", ""), 'area': area, 'tag': tag, 'price': price,'totalprice': totalprice};hlist.append(h)if __name__ == '__main__':user_in_city = input('输入抓取城市(简称,如：成都，就输入：cd.):')region = input('请输入区域（全拼）：')url = generate_cityurl(user_in_city,region)print(url)hlist.append({'title': "楼盘名称", 'wuye': "物业类型", 'xiaoshouzhuangtai': "销售状态", 'location': "位置",'jishi': "房型", 'area': "面积", 'tag': "标签", 'price': "单价",'totalprice': "总价"})areahtml = areainfo(url)listinfo(areahtml)houseinfo = pd.DataFrame(hlist,columns=['title', 'wuye', 'xiaoshouzhuangtai', 'location','jishi', 'area', 'tag', 'price','totalprice'])houseinfo.to_csv('./链家自定义新房.csv', index=False, encoding="utf_8_sig")

1.1 结果

2, 生成信息对比图

import csv
import matplotlib.pyplot as plt
import numpy as np
csvfile = csv.reader(open('链家自定义新房.csv', 'r',encoding = 'UTF-8'))#打开CSV文件
print(csvfile)
data=[]
n = 0
blist = []
dict = {}
for xinfang in csvfile:#遍历CSV文件存入列表data.append(xinfang)
#print(data[2][1])#print(len(data))
#print(data[1][1])for n in range(2,len(data)):#提取类型存入列表blist.append(data[n][1])
#print(blist)
for key in blist:#统计不同类型个数，存入字典dict[key] = dict.get(key,0) + 1
print(dict)
values = list(dict.values())#获取字典values
#print(values)#生成区域房型对比图
plt.bar(range(len(values)),values)
x = range(0,6)
plt.xticks(x,('xiezilou','shangye','dishang','zhucha','shangye'))
plt.title('Comparison of Room Types in Nearby Areas')
plt.show()

Python 爬取链家新房并生成物业类型对比图相关推荐

python爬取链家新房_Python爬虫项目--爬取链家热门城市新房
本次实战是利用爬虫爬取链家的新房(声明: 内容仅用于学习交流, 请勿用作商业用途) 环境 win8, python 3.7, pycharm 正文 1. 目标网站分析通过分析, 找出相关url, 确 ...
python爬取链家新房_Python爬虫实战：爬取链家网二手房数据
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. 买房装修,是每个人都要经历的重要事情之一.相对于新房交易市场来说,如今的二手房交易市场一点也 ...
python爬取链家新房数据_Python爬虫实战：爬取链家网二手房数据
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. 买房装修,是每个人都要经历的重要事情之一.相对于新房交易市场来说,如今的二手房交易市场一点也 ...
python爬取链家新房数据
没有搜索到关于python爬虫,所以自己写一个 from bs4 import BeautifulSoup import requests import time import pandas as p ...
python爬房源信息_用python爬取链家网的二手房信息
题外话:这几天用python做题,算是有头有尾地完成了.这两天会抽空把我的思路和方法,还有代码贴出来,供python的初学者参考.我python的实战经历不多,所以代码也是简单易懂的那种.当然过程中还 ...
python关于二手房的课程论文_基于python爬取链家二手房信息代码示例
基本环境配置 python 3.6 pycharm requests parsel time 相关模块pip安装即可确定目标网页数据哦豁,这个价格..................看到都觉得脑阔 ...
python爬取链家网的房屋数据
python爬取链家网的房屋数据爬取内容爬取源网站爬取内容爬取思路爬取的数据代码获取房屋url 获取房屋具体信息爬取内容爬取源网站北京二手房 https://bj.lianjia. ...
python 爬取链家数据_用python爬取链家网的二手房信息
题外话:这几天用python做题,算是有头有尾地完成了.这两天会抽空把我的思路和方法,还有代码贴出来,供python的初学者参考.我python的实战经历不多,所以代码也是简单易懂的那种.当然过程中还 ...
Python爬取链家成都二手房源信息
作者 | 旧时晚风拂晓城编辑 | JackTian 来源 | 杰哥的IT之旅(ID:Jake_Internet) 转载请联系授权(微信ID:Hc220066) 公众号后台回复:「成都二手房数据」,获 ...

Python 爬取链家新房并生成物业类型对比图

Python 爬取链家新房并生成物业类型对比图相关推荐

最新文章

热门文章

Python 爬取链家新房 并生成物业类型对比图

Python 爬取链家新房 并生成物业类型对比图相关推荐

最新文章

热门文章

Python 爬取链家新房并生成物业类型对比图

Python 爬取链家新房并生成物业类型对比图相关推荐