python房地产数据分析_Python3抓取深圳房地产均价数据，通过真实数据为购置不动产做决策分析（一）...

经过之前的小练习，今天准备做一个相对较为复杂的小项目，最近看到一条新闻说深圳的房价断崖式下跌，平均每月均价下跌46块钱。。。所以准备尝试着抓取互联网上真实的卖房数据，通过大数据的分析，来帮想在深圳买房的小伙伴们，做一个辅助决策分析。

首先我们百度一下，top 3的卖房网站(对百度的竞价排名持怀疑态度$_$)

经过筛选，我准备从链家， Q房网，房天下，三个网站抓取房地产售价数据

首先抓取链家的代码如下：

from bs4 importBeautifulSoupimportrequestsimportcsvfrom requests.exceptions importRequestExceptiondefget_one_page(page):

url= "https://sz.lianjia.com/"headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36','Host': 'sz.lianjia.com','Referer': 'https://www.lianjia.com/','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'zh-CN,zh;q=0.9'}

newUrl= url + 'ershoufang/' + 'pg' +str(page)try:

response= requests.get(newUrl, headers=headers)exceptRequestException as e:print("error:" +response.status_code)

soup= BeautifulSoup(response.text, 'html.parser')#需要抓取：小区名称，面积大小，均价，以及详细信息的链接

for item in soup.select('li .clear'):

detailed_info= item.select('div .houseInfo')[0].text

community_name= detailed_info.split('|')[0].strip()

area= detailed_info.split('|')[2].strip()

average_price= item.select('div .unitPrice span')[0].text

detailed_url= item.select('a')[0].get('href')print("%s\t%s\t%s\t%s"%(community_name, area, average_price, detailed_url))defmain():

get_one_page(2)if __name__ == '__main__':

main()

测试结果如下：

其次抓取Q房网基本代码如下：

from bs4 importBeautifulSoupimportrequestsimportcsvimportrefrom requests.exceptions importRequestExceptiondefget_one_page(page):

url= "https://shenzhen.qfang.com/sale/"headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36','Host': 'shenzhen.qfang.com','Referer': 'https://www.qfang.com/','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'zh-CN,zh;q=0.9'}

newUrl= url + 'f' +str(page)try:

response= requests.get(newUrl, headers=headers)exceptRequestException as e:print("error:" +response.status_code)

soup= BeautifulSoup(response.text, 'html.parser')#需要抓取：小区名称，面积大小，均价，以及详细信息的链接

price_list =[]for item in soup.select('div .show-price'):

average_price= item.select('p')[0].text

price_list.append(average_price)

index=0;for item in soup.select('div .show-detail'):

detailed_url= 'https://shenzhen.qfang.com/sale' + item.select('a')[0].get('href')#在爬取面积的过程中，发现有数据缺失，原因为，有的存在第4个span tag中，有的存在第5个span tag中，所以先都取出来，然后用正则筛选

regax = re.compile('(.*?)平米')

result= item.select('span')[3].text + item.select('span')[4].text

area=re.findall(regax, result)[0]

community_name= (item.find_all(target = '_blank')[0].text).split(' ')[0]

average_price=price_list[index];

index+= 1

print("%s\t%s\t%s\t%s" %(community_name, area, average_price, detailed_url))defmain():

get_one_page(1)if __name__ == '__main__':

main()

测试结果如下：

最后房天下的抓取基本代码如下from bs4 importBeautifulSoupimportrequestsimportcsv

from requests.exceptions importRequestExceptionimportredefget_one_page(page):

url= "http://esf.sz.fang.com/house/"headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36','Host': 'esf.sz.fang.com','Referer': 'https://www.fang.com/','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'zh-CN,zh;q=0.9'}

newUrl= url + 'i3' +str(page)try:

response= requests.get(newUrl, headers=headers)exceptRequestException as e:print("error:" +response.status_code)#用正则抓取：

#regax = re.compile('

python房地产数据分析_Python3抓取深圳房地产均价数据，通过真实数据为购置不动产做决策分析（一）...相关推荐

python地产成本_Python3抓取深圳房地产均价数据，通过真实数据为购置不动产做决策分析（二）...
接下来处理下之前收集到的房地产数据数据: 先分享一个学习数据预处理,数据挖掘,机器学习的实用网站:http://scikit-learn.org/stable/,有很多对应的教程. 本文中提到的数据清 ...
python 爬虫学习：抓取智联招聘网站职位信息(二)
在第一篇文章(python 爬虫学习:抓取智联招聘网站职位信息(一))中,我们介绍了爬取智联招聘网站上基于岗位关键字,及地区进行搜索的岗位信息,并对爬取到的岗位工资数据进行统计并生成直方图展示:同时进 ...
python爬虫代码房-Python爬虫一步步抓取房产信息
原标题:Python爬虫一步步抓取房产信息前言嗯,这一篇文章更多是想分享一下我的网页分析方法.玩爬虫也快有一年了,基本代码熟悉之后,我感觉写一个爬虫最有意思的莫过于研究其网页背后的加载过程了,也就 ...
通过Python爬虫按关键词抓取相关的新闻
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途如今各大网站的反爬机制已经可以说是到了丧心病狂的程度,比如大众点评的字符加密.微博的登录验证等.相比较而言,新闻网站的反爬机制 ...
jupyter分割代码块_科研分享—Python根据关键词自动抓取Pubmed文献标题（附全部代码）文末有福利...
写在前面:接触Python应该是8月初的一篇公众号文章,大致内容是使用py爬取数据库并汇总到本地.正好手头需要对某个领域的文献进行调研,不妨学习一下. 什么是Python? 百度说:Python (计 ...
python爬取百度贴吧中的所有邮箱_使用 Python 编写多线程爬虫抓取百度贴吧邮箱与手机号...
原标题:使用 Python 编写多线程爬虫抓取百度贴吧邮箱与手机号不知道大家过年都是怎么过的,反正栏主是在家睡了一天,醒来的时候登QQ发现有人找我要一份贴吧爬虫的源代码,想起之前练手的时候写过一个抓 ...
python实现人脸识别抓取人脸并做成熊猫头表情包（2）之优化
上次做完python实现人脸识别抓取人脸并做成熊猫头表情包之后就放了一下,因为还要好好学习Springboot毕竟这才是找工作的硬实力.但是优化这个代码心里面一直很想,借用<clean code ...
python beautifulsoup抓取网页内容_利用Python和Beautiful Soup抓取网页内容
利用Python和Beautiful Soup抓取网页内容 Posted on 2012-08-09 00:08 SamWei 阅读(381) 评论(1) 编辑收藏 Python 3中提供了url打 ...
Python爬虫项目：抓取智联招聘信息
来自https://mp.weixin.qq.com/s/0SzLGqv2p0-IWSN3r8bOHA ''' Python爬虫之五:抓取智联招聘基础版该文件运行后会产生一个代码,保存在这个Pyth ...
Python利用bs4批量抓取网页图片并下载保存至本地
Python利用bs4批量抓取网页图片并下载保存至本地使用bs4抓取网页图片,bs4解析比较简单,需要预先了解一些html知识,bs4的逻辑简单,编写难度较低.本例以抓取某壁纸网站中的壁纸为例.(b ...

python房地产数据分析_Python3抓取深圳房地产均价数据，通过真实数据为购置不动产做决策分析（一）...

python房地产数据分析_Python3抓取深圳房地产均价数据，通过真实数据为购置不动产做决策分析（一）...相关推荐

最新文章

热门文章

python房地产数据分析_Python3抓取 深圳房地产均价数据，通过真实数据为购置不动产做决策分析（一）...

python房地产数据分析_Python3抓取 深圳房地产均价数据，通过真实数据为购置不动产做决策分析（一）...相关推荐

最新文章

热门文章

python房地产数据分析_Python3抓取深圳房地产均价数据，通过真实数据为购置不动产做决策分析（一）...

python房地产数据分析_Python3抓取深圳房地产均价数据，通过真实数据为购置不动产做决策分析（一）...相关推荐