Python爬虫，数据可视化之matplotlib初步--制作一个高楼高度的条形统计图全思路

实现步骤：

1. 通过爬虫，爬取高楼的信息

2. 筛选我们的需要的信息

3. 制作一个表格

4. 解析表格，制作统计图

1 -- 爬虫

首先观察目标网页

发现所有有用的信息都在<p>标签中，这就很好办了，使用BeautifulSoup直接把信息提取出来就好。

import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib
import matplotlib.pyplot as pltresponse = None
info_list = []# 爬虫模块，爬取网页源码
def spider(url):global info_listtry:response = requests.get(url)response.encoding = 'utf-8'except Exception as e:print(e)bs = BeautifulSoup(response.text, 'html.parser')content = bs.find_all('p')  # 将标签中的源码提取出来

但是我们发现这样操作过后有标签残留，所以说我们使用get_text()方法，提取文本，并加入一个列表里。由于前三个是垃圾信息，所以说删除了。

import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib
import matplotlib.pyplot as pltresponse = None
info_list = []# 爬虫模块，爬取网页源码
def spider(url):global info_listtry:response = requests.get(url)response.encoding = 'utf-8'except Exception as e:print(e)bs = BeautifulSoup(response.text, 'html.parser')content = bs.find_all('p')  # 将标签中的源码提取出来for word in content:result = word.get_text()  # 这个函数只能提取单一元素不能是列表所以说遍历一下info_list.append(result)del info_list[0:3]

2 -- 筛选信息

我们提取出来后列表里的样子是这样的：

可是我们只需要名字和高度，所以要将其他的信息全部删除。

我们可以观察到，第0项是名字，第5项也是名字，所以说，每个我们需要得到的信息，索引之差都是5，所以我们只要for循环加上步长，是i的值是0，5，10.........这样5个5个加的，就可以以i为索引，提取我们想要的信息了。

def make_form():name_list = []height_list = []# for循环加步长，(x, y, step)一个都不能少for i in range(0, len(info_list), 5):print(info_list[i])

但是我们做统计图，不想要前面的世界第啥啥啥，但是这又是一个字符串，如何批量的编辑呢？

这里，我把字符串拆成了列表，使用del把前面的元素删了，由于第十一个之后需要删除的元素变成了8个，不然就剩了个冒号，所以我添加了条件判断，如果是冒号就删了，注意冒号是中文的冒号。

def make_form():name_list = []height_list = []# for循环加步长，(x, y, step)一个都不能少for i in range(0, len(info_list), 5):# 为了删除字符串的前几个无用字符，把字符串拆成列表删了再合起来temporary_name = list(info_list[i])del temporary_name[0:7]if temporary_name[0] == '：':del temporary_name[0]temporary_name = ''.join(temporary_name)name_list.append(temporary_name)

结果就很符合我的要求了。

接下来同理，我们搞一下高度。

    for j in range(2, len(info_list), 5):temporary_height = list(info_list[j])del temporary_height[0:7]temporary_height.pop()temporary_height = ''.join(temporary_height)height_list.append(temporary_height)

3 -- 制作表格

利用with语句把之前我们弄的数据给填上去

# 制作表格
def make_form():name_list = []height_list = []# for循环加步长，(x, y, step)一个都不能少for i in range(0, len(info_list), 5):# 为了删除字符串的前几个无用字符，把字符串拆成列表删了再合起来temporary_name = list(info_list[i])del temporary_name[0:7]if temporary_name[0] == '：':del temporary_name[0]temporary_name = ''.join(temporary_name)name_list.append(temporary_name)for j in range(2, len(info_list), 5):temporary_height = list(info_list[j])del temporary_height[0:7]temporary_height.pop()temporary_height = ''.join(temporary_height)height_list.append(temporary_height)with open('building.csv', 'w') as f:# 制作一个表格f.write('名称,高度\n')for k in range(len(name_list)):name = name_list[k]height = height_list[k]content = f'{name},{height}\n'f.write(content)

4 -- 使用matplotlib.pyplot来制作统计图，用pandas解析csv

首先import一下需要的库

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

我们用pandas解析csv，可以很方便的使用表格里的数据，其中，rcParams是为了引入字体，让中文字不是方框的，不然只能显示英文。我是Mac OS所以用这句。

# 画条形统计图
def make_photo():# 用一个编辑器matplotlib.use('TkAgg')# 解析csv文件，前面会加0开始的序号data = pd.read_csv('building.csv')print(data)# 添加字体，保证没方框plt.rcParams['font.family'] = ['Hiragino Sans GB']# data['表头名字']提取这一列的数据plt.bar(data['名称'], data['高度'])# 标题，横纵坐标标签plt.title('世界高楼排名')plt.xlabel('名称')plt.ylabel('单位：米')# 显示出来plt.show()

windows把rcParams改成这句：

plt.rcParams['font. family'] = ['SimHei']

接下来大功告成，完整代码：

import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib
import matplotlib.pyplot as pltresponse = None
info_list = []# 爬虫模块，爬取网页源码
def spider(url):global info_listtry:response = requests.get(url)response.encoding = 'utf-8'except Exception as e:print(e)bs = BeautifulSoup(response.text, 'html.parser')content = bs.find_all('p')  # 将标签中的源码提取出来for word in content:result = word.get_text()  # 这个函数只能提取单一元素不能是列表所以说遍历一下info_list.append(result)del info_list[0:3]print(info_list)# 制作表格
def make_form():name_list = []height_list = []# for循环加步长，(x, y, step)一个都不能少for i in range(0, len(info_list), 5):# 为了删除字符串的前几个无用字符，把字符串拆成列表删了再合起来temporary_name = list(info_list[i])del temporary_name[0:7]if temporary_name[0] == '：':del temporary_name[0]temporary_name = ''.join(temporary_name)name_list.append(temporary_name)for j in range(2, len(info_list), 5):temporary_height = list(info_list[j])del temporary_height[0:7]temporary_height.pop()temporary_height = ''.join(temporary_height)height_list.append(temporary_height)with open('building.csv', 'w') as f:# 制作一个表格f.write('名称,高度\n')for k in range(len(name_list)):name = name_list[k]height = height_list[k]content = f'{name},{height}\n'f.write(content)# 画条形统计图
def make_photo():# 用一个编辑器matplotlib.use('TkAgg')# 解析csv文件，前面会加0开始的序号data = pd.read_csv('building.csv')print(data)# 添加字体，保证没方框plt.rcParams['font.family'] = ['Hiragino Sans GB']# data['表头名字']提取这一列的数据plt.bar(data['名称'], data['高度'])# 标题，横纵坐标标签plt.title('世界高楼排名')plt.xlabel('名称')plt.ylabel('单位：米')# 显示出来plt.show()spider('https://jingyan.baidu.com/article/cbf0e500b24b112eab289379.html')
make_form()
make_photo()

效果：

Python爬虫，数据可视化之matplotlib初步--制作一个高楼高度的条形统计图全思路相关推荐

Python爬虫数据可视化
Python爬虫--数据可视化导入需要的第三方库 import matplotlib.pyplot as plt import seaborn as sns import pandas as pd ...
python爬虫数据可视化软件_python爬虫及数据可视化分析
1.前言本篇文章主要介绍python爬虫及对爬取的数据进行可视化分析,本次介绍所用的网站是(https://www.duanwenxue.com/jingdian/zheli/) 2.数据爬取 2. ...
python做数据可视化视频_如何制作数据可视化视频？
超干货!超多鲜为人知的宝藏可视化工具! 那些酷炫的可视化视频是怎么制作的?数据控结合具体案例为你一一揭秘! !!建议先码后看!跟着数据控轻松学会可视化! 1 堪比科幻电影!这样的数据大屏太酷啦!Eas ...
Python爬虫数据分析毕业论文,Python爬虫数据可视化
如何用Python爬虫抓取网页内容? 爬虫流程其实把网络爬虫抽象开来看,它无外乎包含如下几个步骤模拟请求网页.模拟浏览器,打开目标网站.获取数据.打开网站之后,就可以自动化的获取我们所需要的网站数据. ...
python爬虫数据可视化_适用于Python入门者的爬虫和数据可视化案例
本篇文章适用于Python小白的教程篇,如果有哪里不足欢迎指出来,希望对你帮助. 本篇文章用到的模块: requests,re,os,jieba,glob,json,lxml,pyecharts,he ...
python爬虫数据可视化_python 爬虫与数据可视化--python基础知识
摘要:偶然机会接触到python语音,感觉语法简单.功能强大,刚好朋友分享了一个网课<python 爬虫与数据可视化>,于是在工作与闲暇时间学习起来,并做如下课程笔记整理,整体大概分为4个 ...
房地产数据-python爬虫+数据可视化
使用python3.7对链家网中广州二手房的交易数据进行爬取,并使用python-highcharts对爬取到的数据进行可视化分析. 首先,配置需要的环境: 打开终端cmd,进入pip所在的目录,安装 ...
Python爬虫+数据可视化教学：分析猫咪交易数据
前言各位,七夕快到了,想好要送什么礼物了吗? 昨天有朋友私信我,问我能用Python分析下网上小猫咪的数据,是想要送一只给女朋友,当做礼物. Python从零基础入门到实战系统教程.源码.视频网上 ...
python爬虫数据可视化_[ Python爬虫实战 ] Python使用pyecharts进行数据可视化 - pytorch中文网...
pyecharts是一个用于生成Echarts图表的类库,Echarts是百度开源的一个数据可视化JS库.主要用于数据可视化.pyecharts可以结合Pandas&Numpy使用,同时他可以 ...

Python爬虫，数据可视化之matplotlib初步--制作一个高楼高度的条形统计图全思路

Python爬虫，数据可视化之matplotlib初步--制作一个高楼高度的条形统计图全思路相关推荐

最新文章

热门文章