#python学习笔记#使用python爬取网站数据并保存到数据库

上篇说到如何使用python通过提取网页元素抓取网站数据并导出到excel中，今天就来说说如何通过获取json爬取数据并且保存到mysql数据库中。

本文主要涉及到三个知识点：

1.通过抓包工具获取网站接口api

2.通过python解析json数据

3.通过python与数据库进行连接，并将数据写入数据库。

抓包不是本文想说的主要内容，大家可以移步这里或者直接在百度搜索“fiddler手机抓包”去了解抓包的相关内容，对了，这篇简书中也公布了一些网站的接口，大家也可以直接去那儿获取。

ok，那直接切入正题，首先看看python是如何拿到json并且解析json的：

获取json数据：

def getHtmlData(url):# 请求headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166  Safari/535.19'}request = urllib.request.Request(url, headers=headers)response = urllib.request.urlopen(request)data = response.read()# 设置解码方式data = data.decode('utf-8')return data

解析json:

解析json之前，我们先来看看我们得到的json是怎样的(数据较多，相同结构的数据隐藏了一些)：

{"id": 1,"label": "头条","prev": "https://api.dongqiudi.com/app/tabs/android/1.json?before=1658116800","next": "https://api.dongqiudi.com/app/tabs/android/1.json?after=1500443152&page=2","max": 1658116800,"min": 1500443152,"page": 1,"articles": [{"id": 375248,"title": "还记得他们吗？那些年，我们也有自己的留洋军团","share_title": "还记得他们吗？那些年，我们也有自己的留洋军团","description": "","comments_total": 1026,"share": "https://www.dongqiudi.com/article/375248","thumb": "http://img1.dongqiudi.com/fastdfs1/M00/97/55/180x135/crop/-/pIYBAFlkjm-AMc7AAAL4n-oihZs769.jpg","top": true,"top_color": "#4782c4","url": "https://api.dongqiudi.com/article/375248.html?from=tab_1","url1": "https://api.dongqiudi.com/article/375248.html?from=tab_1","scheme": "dongqiudi:///news/375248","is_video": false,"new_video_detail": null,"collection_type": null,"add_to_tab": "0","show_comments": true,"published_at": "2022-07-18 12:00:00","sort_timestamp": 1658116800,"channel": "article","label": "深度","label_color": "#4782c4"},{"id": 382644,"title": "连续三年英超主场负于水晶宫，今晚克洛普的扑克牌怎么打呢？","share_title": "连续三年英超主场负于水晶宫，今晚克洛普的扑克牌怎么打呢？","comments_total": 0,"share": "https://www.dongqiudi.com/article/382644","thumb": "","top": false,"top_color": "","url": "https://api.dongqiudi.com/article/382644.html?from=tab_1","url1": "https://api.dongqiudi.com/article/382644.html?from=tab_1","scheme": null,"is_video": true,"new_video_detail": "1","collection_type": null,"add_to_tab": null,"show_comments": true,"published_at": "2017-07-19 14:55:25","sort_timestamp": 1500447325,"channel": "video"},{"id": 382599,"title": "梦想不会褪色！慈善机构圆孟买贫民区女孩儿的足球梦","share_title": "梦想不会褪色！慈善机构圆孟买贫民区女孩儿的足球梦","comments_total": 9,"share": "https://www.dongqiudi.com/article/382599","thumb": "http://img1.dongqiudi.com/fastdfs1/M00/9C/D3/180x135/crop/-/o4YBAFlu8F2AcFtwAACX_DJbrwo612.jpg","top": false,"top_color": "","url": "https://api.dongqiudi.com/article/382599.html?from=tab_1","url1": "https://api.dongqiudi.com/article/382599.html?from=tab_1","scheme": null,"is_video": true,"new_video_detail": "1","collection_type": null,"add_to_tab": null,"show_comments": true,"published_at": "2017-07-19 14:45:20","sort_timestamp": 1500446720,"channel": "video"}],"hotwords": "JJ同学","ad": [],"quora": [{"id": 182,"type": "ask","title": "足坛历史上有哪些有名的更衣室故事？","ico": "","thumb": "http://img1.dongqiudi.com/fastdfs1/M00/9B/BE/pIYBAFlt3uyACqEnAADhb9FVavU28.jpeg","answer_total": 222,"scheme": "dongqiudi:///ask/182","position": 7,"sort_timestamp": 1500533674,"published_at": "2017-07-20 14:54:34"}]
}

好，我们现在就将articles这个数组中的数据解析出来，通过这个过程你就会知道为什么python会这么火了~：

先导入解析json的包：

imprt json

然后解析：

dataList = json.loads(data)['articles']

你没看错，就这一步便取出了articles这个json数组；

接下来取出articles中的对象并添加到python的list中，留待后面添加到数据库中使用：

 for index in range(len(dataList)):newsObj = dataList[index]#print(newsObj.get('title'))newsObjs = [newsObj.get('id'), newsObj.get('title'), newsObj.get('share_title'), newsObj.get('description'),newsObj.get('comments_total'), newsObj.get('share'), newsObj.get('thumb'), newsObj.get('top'),newsObj.get('top_color'), newsObj.get('url'), newsObj.get('url1'), newsObj.get('scheme'),newsObj.get('is_video'), newsObj.get('new_video_detail'), newsObj.get('collection_type'),newsObj.get('add_to_tab'), newsObj.get('show_comments'), newsObj.get('published_at'),newsObj.get('channel'), str(first_label), newsObj.get('comments_total')]

解析json的工作到这就完成了，接下来就是连接数据库了：

#执行sql语句
def executeSql(sql,values):conn = pymysql.connect(host=str(etAddress.get()), port=int(etPort.get()), user=str(etName.get()),passwd=str(etPassWd.get()), db=str(etDBName.get()))cursor = conn.cursor()conn.set_charset('utf8')effect_row = cursor.execute(sql, values)# 提交，不然无法保存新建或者修改的数据conn.commit()# 关闭游标cursor.close()# 关闭连接conn.close()

是不是觉得很眼熟，的确python连接数据库和java等类似，也是建立连接，输入mysql的地址，端口号，数据库的用户名，密码然后通过cursor返回操作结果，当然最后要把连接，cursor都关掉。（python连接数据库需要导入pymysql的包，直接通过pip安装，然后import即可）sql语句的写法也和java等类似，整个过程是这样的：

#插入新闻
def insertNews(data):if len(data) > 2:dataList = json.loads(data)['articles']first_label = json.loads(data)['label']for index in range(len(dataList)):newsObj = dataList[index]#print(newsObj.get('title'))newsObjs = [newsObj.get('id'), newsObj.get('title'), newsObj.get('share_title'), newsObj.get('description'),newsObj.get('comments_total'), newsObj.get('share'), newsObj.get('thumb'), newsObj.get('top'),newsObj.get('top_color'), newsObj.get('url'), newsObj.get('url1'), newsObj.get('scheme'),newsObj.get('is_video'), newsObj.get('new_video_detail'), newsObj.get('collection_type'),newsObj.get('add_to_tab'), newsObj.get('show_comments'), newsObj.get('published_at'),newsObj.get('channel'), str(first_label), newsObj.get('comments_total')]sql = "insert into news(id,title,share_title,description,comments_total," \"share,thumb,top,top_color,url,url1,scheme,is_video,new_video_detail," \"collection_type,add_to_tab,show_comments,published_at,channel,label)" \"values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s) " \"ON DUPLICATE KEY UPDATE comments_total = %s"executeSql(sql=sql,values=newsObjs)
#执行sql语句
def executeSql(sql,values):print(str(etPassWd.get()))conn = pymysql.connect(host=str(etAddress.get()), port=int(etPort.get()), user=str(etName.get()),passwd=str(etPassWd.get()), db=str(etDBName.get()))cursor = conn.cursor()conn.set_charset('utf8')effect_row = cursor.execute(sql, values)# 提交，不然无法保存新建或者修改的数据conn.commit()# 关闭游标cursor.close()# 关闭连接conn.close()

最后在main里面：

data = getHtmlData(url)
insertNews(data=data)

调用即可，最后数据就存进了数据库里：

当然你也可以做一个界面出来玩玩：

如果大家有需要，我会把demo也传上来抛砖引玉！

#python学习笔记#使用python爬取网站数据并保存到数据库相关推荐

Python学习笔记16：爬取百度搜索图片的缩略图
找图,通过百度之类的搜索引擎最方便. 如何爬取百度搜索的图片,也是一个有意思的话题. 进入百度图片,随便输入搜索内容,比如"美女". 打开 F12 ,查看 network ,选择 ...
python爬虫第二弹-多线程爬取网站歌曲
python爬虫第二弹-多线程爬取网站歌曲一.简介二.使用的环境三.网页解析 1.获取网页的最大页数 2.获取每一页的url形式 3.获取每首歌曲的相关信息 4.获取下载的链接四.代码实现一 ...
Python运用urllib2和BeautifulSoup爬取网站ZOL桌面壁纸上的精美电脑壁纸
Python运用urllib2和BeautifulSoup爬取网站ZOL桌面壁纸上的精美电脑壁纸 #!/usr/bin/env python # -*- coding: utf-8 -*- # @Ti ...
如何利用python爬取网站数据
Python是一种非常适合用于网络爬虫的编程语言,以下是Python爬取网站数据的步骤: 1. 确定目标网站和所需数据:首先要找到需要爬取数据的网站,确定你需要爬取的数据是哪一部分. 2. 发送请求: ...
Python爬取网站图片并保存，超级简单
Python爬取网站图片并保存,超级简单先看看结果吧,去bilibili上拿到的图片=-= 第一步,导入模块 import requests from bs4 import BeautifulSou ...
Python爬虫实战，requests+openpyxl模块，爬取小说数据并保存txt文档（附源码）
前言今天给大家介绍的是Python爬取小说数据并保存txt文档,在这里给需要的小伙伴们代码,并且给出一点小心得. 首先是爬取之前应该尽可能伪装成浏览器而不被识别出来是爬虫,基本的是加请求头,但是这样 ...
利用linux curl爬取网站数据
看到一个看球网站的以下截图红色框数据,想爬取下来,通常爬取网站数据一般都会从java或者python爬取,但本人这两个都不会,只会shell脚本,于是硬着头皮试一下用shell爬取,方法很笨重,但旨在 ...
Pycharm + python 爬虫简单爬取网站数据
本文主要介绍简单的写一个爬取网站图片并将图片下载的python爬虫示例. 首先,python爬虫爬取数据,需要先了解工具包requests以及BeautifulSoup requests中文文档:ht ...
使用python爬取网站数据并写入到excel中
文章目录前言一.使用python爬取网上数据并写入到excel中例子一: 例子二: 二.工具类总结前言记录一下使用python将网页上的数据写入到excel中一.使用python爬取网上 ...

#python学习笔记#使用python爬取网站数据并保存到数据库

#python学习笔记#使用python爬取网站数据并保存到数据库相关推荐

最新文章

热门文章