【Python学习】http网站发送请求

方法一：requests

官方文档
https://pypi.org/project/requests/

Requests模块是一个用于网络请求的模块，主要用来模拟浏览器发请求。其实类似的模块有很多，比如urllib，urllib2，httplib，httplib2，他们基本都提供相似的功能。但是这些模块都复杂而且差不多过时了，requests模块简单强大高效，使得其在众多网络请求模块中脱引而出。

requests使用

环境安装：pip install requests
使用流程：

指定url
基于requests模块发送请求
获取响应对象中的数据值
持久化存储（不是必须的）

案例：爬取百度首页的数据

# 1. 导包
import requests# 2. 指定url
url = "https://www.baidu.com"
# 3. 使用GET方法发送请求，该方法会返回一个响应对象
response = requests.get(url=url)
# 4. 获取响应数据
print(response.status_code)  # 打印状态码
print(response.url)          # 打印请求url
print(response.headers)      # 打印响应头头信息
print(response.text)  # 以文本形式打印网页源码# 保存数据
response.encoding = 'utf-8'  # 指定编码格式，不然打开乱码
text = response.text
with open('./2.html' ,'w' ,encoding='utf-8') as f:f.write(text)

requests请求方法

上面的案例requests发送了一个GET请求方法，除此之外还有其他的请求方法。最常用的就是GET和POST方法。

res = requests.get ()
res = requests.post ()
res = requests.put ()
res = requests.delete ()
res = requests.head ()
res = requests.options ()

且在指定方法发送请求的时候，有时候还需要在请求方法括号中requests.get(url=url, xx = xx)指定一些参数，如下。先了解一下

方法	参数名字
HTTP头部	headers
GET参数	params
POST参数	data
文件	files
Cookies	cookies
重定向处理	allow_redirects = False/True
超时	timeout
证书验证	verify = False/True
工作流(延迟下载)	stream=False/ True
事件挂钩	hooks=dict(response=)
身份验证	auth=
代理	proxies=

requests响应对象属性
在上面爬取百度首页时，response = requests.get(url=url)其返回的是一个响应对象，而如果我们想要获取具体的数据比如响应码或者网页源码时，就需要通过指定响应对象的属性进行获取。如response.status_code获取响应码

内容	方法
获取请求url	res.url
状态码	res.status_code
响应数据（以字符串形式）	res.text
返回的是一个原生字符串，是bytes类型	res.content
查看服务器响应头	res.headers
查看cookie	res.cookies

发送get请求

# 导入requests包
import requestsurl = "http://www.tuling123.com/openapi/api"
myParams = {"key":"username","info":"plusroax"} # 字典格式，推荐使用，它会自动帮你按照k-v拼接url
res = requests.get(url=url, params=myParams)print('url:',res.request.url)# 查看发送的url
print("response:",res.text)  # 返回请求结果

输出

url: http://www.tuling123.com/openapi/api?key=username&info=plusroax
response: {"code":40001,"text":"亲爱的，key不对哦。"}

发送post请求

# 导入requests包
import requestsurl = "http://httpbin.org/post"
data = {"name": "plusroax","age": 18} # Post请求发送的数据，字典格式
res = requests.post(url=url, data = data)#这里传入的data,是body里面的数据。params是拼接url时的参数print("发送的body:",res.request.body)
print("response返回结果：",res.text)

输出

发送的body: name=plusroax&age=18
response返回结果： {"args": {}, "data": "", "files": {}, "form": {"age": "18", "name": "plusroax"}, "headers": {"Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "20", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.27.1", "X-Amzn-Trace-Id": "Root=1-633e9c08-5916485673661c9c6b286b47"}, "json": null, "origin": "6.6.6.6", "url": "http://httpbin.org/post"
}

方法二：requests-html

官方文档
https://pypi.org/project/requests-html/

安装命令

pip install requests-html

发送get请求

from requests_html import HTMLSession# 获取请求对象
session = HTMLSession()sina = session.get('https://news.sina.com.cn/')sina.encoding = 'utf-8'print(sina.text)

方法三：urllib

发送post请求

# 创建一个 HTTP POST 请求，输出响应上下文
from urllib.request import urlopen
from urllib.parse import urlencode
data = {'kw' : 'python'}
data = bytes(urlencode(data), encoding = 'utf-8')
response = urlopen("https://fanyi.baidu.com/sug", data)
print(response.read().decode('unicode_escape'))

输出

{"errno":0,"data":[{"k":"Python","v":"蛇属，蟒蛇属"},{"k":"python","v":"n. 巨蛇，大蟒"},{"k":"pythons","v":"n. 巨蛇，大蟒( python的名词复数 )"}]}

携带header发送get请求

# 采用 HTTP GET 请求的方法模拟谷歌浏览器访问网站，输出响应上下文
from urllib import request,parse
url = 'http://www.httpbin.org/get'
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}
req = request.Request(url, headers = headers, method = 'GET')
response = request.urlopen(req)
print(response.read())