request

基本使用

from lxml import etree
import requests
r = requests.get('https://www.qq.com/')
html = etree.HTML(r.text)
li = html.xpath('//div[@class="layout qq-main cf"]/div[@class="col col-2 fl"]/div[@class="mod m-topic"]/div[2]/ul/li')
print(len(li))
for item in li:x = item.xpath('a[last()]/text()')print(x)

1.GET请求

(1)GET请求添加额外信息

import requests
data = {'name':'lu','age':'22'
}
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}
r = requests.get('http://httpbin.org/get',params=data,headers=headers)
#params相当于重新构造URL
print(r.text)

如果返回的是json数据，可以直接调用r.json()得到一个字典

(2)抓取二进制数据

图片，音频，视频这些文件都是二进制数据

import requests
r = requests.get('https://img.ivsky.com/img/bizhi/pre/201906/27/haian_shatan.jpg')
with open('immm.jpg','wb') as f:f.write(r.content)#将二进制数据写入文件

2.POST请求

import requests
data = dict()
r = requests.post('http://httpbin.org/post',data=data)
print(r.text)
print(r.status_code)
#状态码 200成功
print(r.headers)
#响应头
print(r.cookies)
print(r.url)
print(r.history)

3.高级用法

(1)文件上传

文件上传会单独有一个file字段来标识

files = {'file':open('immm.jpg','rb')}
r = requests.post('http://httpbin.org/post',files=files)

(2)Cookies

获取Cookies

import requests
r = requests.post('https://www.baidu.com')
for key,value in r.cookies.items():print(key,value)

设置Cookies

import requests
cookies = 'a=3;b=4'
headers = {'Cookie':cookies,'User-Agent':''
}
r = requests.post('https://www.baidu.com',headers=headers)
#1
jar = requests.cookies.RequestsCookieJar()
for cookie in cookies.split(';'):key,value = cookie.split('=',1)jar.set(key,value)
r = requests.post('https://www.baidu.com',cookies=jar)
#2

(3)代理设置

使用proxies参数

import requests
proxies = {'http':'http://10.10.1.10:3128','https':'http://10.10.1.10:1080'
}
r = requests.get('https://www.baidu.com',proxies=proxies)

(4)Prepared Request

可以将请求表示为数据结构

import requests
url = 'http://httpbin.org/get'
data = dict()
headers = dict()
s = requests.Session()
req = requests.Request('GET',url,data=data,headers=headers)
#构建Request
preped = s.prepare_request(req)
#转换为prepared_request对象
r = s.send(preped)
#发送
print(r.text)

urllib

基本使用

import urllib.request
from lxml import etree
r = urllib.request.urlopen('https://blog.csdn.net/ljq1998/article/details/99423615')
html = etree.HTML(r.read().decode('utf-8'))
x = html.xpath('//h1[@class="title-article"]/text()')
print(x)

1.urlencode()

将字典序列化为GET请求参数

2.quote()

将中文字符转化为URL编码

import urllib.parse
url = 'http://www.baidu.com?'
print(url)
data = {'name':urllib.parse.quote('刘嘉强'),'age':22
}
url = url + urllib.parse.urlencode(data)
print(url)

基本库 urllib , request相关推荐

Python爬虫：史上最详细的Python爬虫库urllib讲解，绝对经典，值得收藏
目录网络库urllib request 发送GET请求发送POST请求请求超时处理爬虫伪装代理获取Cookie Parse 中文的编码与解码 quote与unquote URL解析连接U ...
自定义request_python3下urllib.request库之Handle处理器和自定义Opener
python3下urllib.request库高级应用之Handle处理器和自定义Opener python3下urllib.request库高级应用之Handle处理器和自定义Opener 经过前面 ...
python urllib.request 爬虫数据处理-python 爬虫之 urllib库
文章更新于:2020-03-02 注:代码来自老师授课用样例. 一.初识 urllib 库在 python2.x 版本,urllib 与urllib2 是两个库,在 python3.x 版本,二者合 ...
python urllib.request 爬虫数据处理-python之爬虫（三） Urllib库的基本使用
什么是Urllib Urllib是python内置的HTTP请求库包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模 ...
python urllib.request 爬虫数据处理-python爬虫1--urllib请求库之request模块
urllib为python内置的HTTP请求库,包含四个模块: request:最基本的HTTP请求模块, 只需要传入URL和参数 error:异常处理模块 parse:工具模块,处理URL,拆分.解 ...
python urllib.request 爬虫数据处理-Python网络爬虫(基于urllib库的get请求页面)
一.urllib库 urllib是Python自带的一个用于爬虫的库,其主要作用就是可以通过代码模拟浏览器发送请求.其常被用到的子模块在Python3中的为urllib.request和urllib. ...
Crawler：爬虫基于urllib.request库实现获取指定网址上的所有图片
Crawler:爬虫基于urllib.request库实现获取指定网址上的所有图片目录输出结果核心代码输出结果核心代码 # coding=gbk import urllib.request ...
Crawler/ML：爬虫技术(基于urllib.request库从网页获取图片)+HierarchicalClustering层次聚类算法，实现自动从网页获取图片然后根据图片色调自动分类
Crawler/ML:爬虫技术(基于urllib.request库从网页获取图片)+HierarchicalClustering层次聚类算法,实现自动从网页获取图片然后根据图片色调自动分类目录一. ...
python爬虫用urllib还是reques_Python爬虫之urllib.request库
爬虫--urllib.request库的基本使用所谓网页抓取,就是把URL地址中指定的网络资源从网络流中读取出来,保存到本地.在Python中有很多库可以用来抓取网页,我们先学习urllib.req ...

基本库 urllib , request

request

1.GET请求

(1)GET请求添加额外信息

(2)抓取二进制数据

2.POST请求

3.高级用法

(1)文件上传

(2)Cookies

urllib

1.urlencode()

2.quote()

基本库 urllib , request相关推荐

最新文章

热门文章