Python高性能HTTP客户端库requests的使用

Python中有许多HTTP客户端。使用最广泛且最容易的是requests。

持续连接

很多人学习python，不知道从何学起。
很多人学习python，掌握了基本语法过后，不知道在哪里寻找案例上手。
很多已经做案例的人，却不知道如何去学习更加高深的知识。
那么针对这三类人，我给大家提供一个好的学习平台，免费领取视频教程，电子书籍，以及课程的源代码！
QQ群：1097524789

持续连接是自HTTP 1.1以来的标准，尽管许多应用程序并未使用它们。以简单模式使用请求时(例如使用get函数)，连接会在返回时关闭，Session对象允许重用已经打开的连接。

import requests session = requests.Session() session.get("https://china-testing.github.io/") # Connection is re-used session.get("https://china-testing.github.io/")

每个连接都存储在连接池中(默认为10)

import requests session = requests.Session() adapter = requests.adapters.HTTPAdapter( pool_connections=100, pool_maxsize=100) session.mount('http://', adapter) response = session.get("http://example.org")

重用TCP连接有许多性能优势：

降低CPU和内存使用率(同时打开的连接较少)。
减少了后续请求中的延迟(无TCP握手)。
可以引发异常而不会关闭TCP连接。

HTTP协议还提供了流水线，流水线化允许在同一连接上发送多个请求，而无需等待答复(批处理)。不幸的是，请求库不支持此功能。但是，流水线请求可能不如并行发送它们那么快。实际上，HTTP 1.1协议强制以与发送请求相同的顺序发送答复-先进先出。

并行

requests的主要缺点是同步的。调用requests.get("http://example.org")会阻塞程序，直到HTTP服务器完全答复为止。可以通过使用并发线程提供的线程池来缓解此问题。它允许以非常快速的方式并行化HTTP请求。

from concurrent import futures import requests with futures.ThreadPoolExecutor(max_workers=4) as executor: futures = [ executor.submit( lambda: requests.get("http://example.org")) for _ in range(8) ] results = [ f.result().status_code for f in futures] print("Results: %s" % results)

也可以借助requests-futures的库：

来自requests_futures导入会话

from requests_futures import sessions

session = sessions.FuturesSession()

futures = [ session.get("http://example.org") for _ in range(8) ]

results = [ f.result().status_code for f in futures ]

print("Results: %s" % results)

在请求中使用期货

默认情况下，创建具有两个线程的工作程序，但是程序可以通过将max_workers参数甚至是自己的执行程序传递给FuturSession对象来轻松自定义此值，例如：FuturesSession(executor=ThreadPoolExecutor(max_workers=10))。

异步

如前所述，请求是完全同步的。这会在等待服务器回复时阻止应用程序，从而降低程序速度。在线程中发出HTTP请求是一种解决方案，但是线程确实有其自身的开销，这暗示着并行性，这并不是每个人总是很高兴在程序中看到的东西。

从版本3.5开始，Python使用异步将异步作为其核心。aiohttp库提供了一个基于asyncio之上的异步HTTP客户端。该库允许按顺序发送请求，但无需等待答复回来再发送新请求。与HTTP流水形成对比，aiohttp通过多个连接并行发送请求，避免了前面解释的排序问题。

import aiohttp import asyncio async def get(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return response loop = asyncio.get_event_loop() coroutines = [get("https://china-testing.github.io/") for _ in range(8)] results = loop.run_until_complete(asyncio.gather(*coroutines)) print("Results: %s" % results)

所有这些解决方案都提供了不同的方法来提高HTTP客户端的速度。

性能

下面的代码向HTTPbin.org发送请求。本示例实现了上面列出的所有技术并对它们进行计时。

import contextlib import time import aiohttp import asyncio import requests from requests_futures import sessions URL = "http://httpbin.org/delay/1" TRIES = 10 @contextlib.contextmanager def report_time(test): t0 = time.time() yield print("Time needed for `%s' called: %.2fs" % (test, time.time() - t0)) with report_time("serialized"): for i in range(TRIES): requests.get(URL) session = requests.Session() with report_time("Session"): for i in range(TRIES): session.get(URL) session = sessions.FuturesSession(max_workers=2) with report_time("FuturesSession w/ 2 workers"): futures = [session.get(URL) for i in range(TRIES)] for f in futures: f.result() session = sessions.FuturesSession(max_workers=TRIES) with report_time("FuturesSession w/ max workers"): futures = [session.get(URL) for i in range(TRIES)] for f in futures: f.result() async def get(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: await response.read() loop = asyncio.get_event_loop() with report_time("aiohttp"): loop.run_until_complete( asyncio.gather(*[get(URL) for i in range(TRIES)]))

运行此程序将给出以下输出：

Time needed for `serialized' called: 12.12s Time needed for `Session' called: 11.22s Time needed for `FuturesSession w/ 2 workers' called: 5.65s Time needed for `FuturesSession w/ max workers' called: 1.25s Time needed for `aiohttp' called: 1.19s

Streaming

另一个有效的速度优化是流式传输请求。发出请求时，默认情况下会立即下载响应的正文。请求库提供的流参数或aiohttp的content属性都提供了一种在执行请求后不立即将全部内容加载到内存中的方法。

import requests

# Use `with` to make sure the response stream is closed and the connection can # be returned back to the pool. with requests.get('http://example.org', stream=True) as r: print(list(r.iter_content()))

用aiohttp流

import aiohttp import asyncio async def get(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.content.read() loop = asyncio.get_event_loop() tasks = [asyncio.ensure_future(get("https://china-testing.github.io/"))] loop.run_until_complete(asyncio.wait(tasks)) print("Results: %s" % [task.result() for task in tasks])

为了避免无用地分配可能的数百兆内存，不加载全部内容非常重要。如果您的程序不需要整体访问整个内容，而是可以处理块，那么最好使用这些方法。例如，如果您要保存内容并将其写入文件，则仅读取一个块并同时写入它将比读取整个HTTP正文(分配大量的内存)具有更高的内存效率。然后将其写入磁盘。

我希望这可以使您更轻松地编写适当的HTTP客户端和请求。如果您知道任何其他有用的技术或方法，请随时在下面的评论部分写下来！

Python高性能HTTP客户端库requests的使用相关推荐

ip地址 python request_【Python】 http客户端库requests urllib2 以及ip地址处理IPy
requests requests是个HTTPClient库,相比于urllib,urllib2等模块比更加简洁易用 ■ get请求作为示例,讲一下关于requests如何发起并处理一个get请求 ...
Requests 1.0 发布，Python 的 HTTP 客户端库
Python 的 HTTP 客户端库 -- Requests 发布了 1.0 版本,该版本确定了最终的公共 API 方法. Requests 是一个 Python 的 HTTP 客户端库. 示例代码: ...
【网络爬虫入门02】HTTP客户端库Requests的基本原理与基础应用
[网络爬虫入门02]HTTP客户端库Requests的基本原理与基础应用广东职业技术学院欧浩源 2017-10-15 1.引言实现网络爬虫的第一步就是要建立网络连接并向服务器或网页等网络资源 ...
Python 深入浅出 - 网络请求库 Requests
Requests 是用 Python 语言编写的,基于 urllib,采用 Apache2 Licensed 开元协议的 HTTP 库,它比 urllib 更加方便,编写爬虫和测试服务器响应数据时经常 ...
python爬取图片的库_16-python爬虫之Requests库爬取海量图片
Requests 是一个 Python 的 HTTP 客户端库. Request支持HTTP连接保持和连接池,支持使用cookie保持会话,支持文件上传,支持自动响应内容的编码,支持国际化的URL和P ...
Python的强大HTTP库：Requests
一.简介 requests 是一个使用广泛的Python库,专门用于处理HTTP请求.在requests的帮助下,开发者能够方便快捷地完成诸如发送GET/POST请求.处理Cookies和文件上传等常 ...
python爬虫入门教程--优雅的HTTP库requests（二）
requests 实现了 HTTP 协议中绝大部分功能,它提供的功能包括 Keep-Alive.连接池.Cookie持久化.内容自动解压.HTTP代理.SSL认证等很多特性,下面这篇文章主要给大家介绍 ...
linux下载python的es库,Elasticsearch py客户端库安装及使用方法解析
一.介绍 elasticsearch-py是一个官方提供的low-level的elasticsearch python客户端库.为什么说它是一个low-level的客户端库呢?因为它只是对elasti ...
python http get 请求_Python-Http请求库-Requests and AIOHTTP的使用
首先对库进行安装: pip install aiohttp[speedups] pip install requests 一.Requests库 Requests 简便的 API 意味着所有 HTTP ...

Python高性能HTTP客户端库requests的使用

下面的代码向HTTPbin.org发送请求。本示例实现了上面列出的所有技术并对它们进行计时。

Python高性能HTTP客户端库requests的使用相关推荐

最新文章

热门文章