HTTPX 是一个功能强大的 Python HTTP 客户端库,可以看作是 requests 的现代化替代品。
它最核心的优势在于,既保留了 requests 简单易用的 API,又原生支持异步请求和 HTTP/2 协议,让你能够用同一套代码风格,轻松应对从简单脚本到高并发爬虫的各种场景
安装
pip install httpx# 如果需要 HTTP/2 支持pip install httpx[http2]
基础概念
同步客户端: httpx.Client() - 用于同步代码
异步客户端: httpx.AsyncClient() - 用于异步代码,需要 async/await
请求方法: GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS
同步客户端
# ----------- GET 请求 -----------params = {'key1': 'value1', 'key2': 'value2'}response = httpx.get('https://httpbin.org/get', params=params)# ----------- POST 请求 -----------# 表单数据data = {'key': 'value', 'name': 'httpx'}# JSON 数据json_data = {'key': 'value', 'number': 123}# 文件上传files = {'file': open('example.txt', 'rb')}response = httpx.post('https://httpbin.org/post', data=data, json=json_data, files=files)# ----------- 使用 Client 上下文管理器 -----------with httpx.Client() as client:# 多个请求共享连接,性能更好 response1 = client.get('https://httpbin.org/get') response2 = client.post('https://httpbin.org/post', json={'data': 'test'})print(response1.status_code, response2.status_code)
*对于需要发送多个请求到同一站点的场景,推荐使用 Client。它会复用底层的 TCP 连接,显著提升性能
异步客户端
基础请求
asyncwith httpx.AsyncClient() as client: response = await client.get('https://httpbin.org/get')print(response.status_code)print(response.json())
并发多个异步请求
import httpximport asyncioasyncdeffetch_url(client, url): response = await client.get(url)return response.status_code, len(response.content)asyncdefmain(): urls = ['https://httpbin.org/get','https://httpbin.org/post','https://httpbin.org/json' ]asyncwith httpx.AsyncClient() as client:# 方法1: 使用 asyncio.gather tasks = [fetch_url(client, url) for url in urls] results = await asyncio.gather(*tasks)# 方法2: 使用异步推导式# results = [await fetch_url(client, url) for url in urls]for url, (status, length) inzip(urls, results):print(f"{url}: Status {status}, Length {length}")asyncio.run(main())
高级功能
1. 超时配置
# 全局超时with httpx.Client(timeout=10.0) as client:# 精细化的超时配置timeout = httpx.Timeout(connect=5.0, read=10.0, write=5.0, pool=1.0)with httpx.Client(timeout=timeout) as client:
2. 自定义 Headers 和认证
# 自定义 Headersheaders = { 'User-Agent': 'MyApp/1.0', 'Authorization': 'Bearer token',}# Basic 认证auth = ('username', 'password')with httpx.Client(headers=headers, auth=auth) as client:
3. 代理配置
proxy = "http://127.0.0.1:7890"with httpx.Client(proxy=proxy) as client:
4. Cookie 管理
# 自动处理 Cookieswith httpx.Client() as client:# 设置 Cookie client.get('https://httpbin.org/cookies/set?name=value')# 查看 Cookies response = client.get('https://httpbin.org/cookies')print(response.json())# 手动设置 Cookie cookies = {'my_cookie': 'cookie_value'} response = client.get('https://httpbin.org/cookies', cookies=cookies)
5. 流式响应
asyncdefstream_response():asyncwith httpx.AsyncClient() as client:asyncwith client.stream('GET', 'https://httpbin.org/stream/5') as response:asyncfor chunk in response.aiter_bytes():print(f"接收: {len(chunk)} bytes")
6. 事件钩子 (Event Hooks)
# 定义请求钩子函数deflog_request(request):print(f"[请求] {request.method}{request.url} - 准备发送")# 定义响应钩子函数deflog_response(response): request = response.requestprint(f"[响应] {request.method}{request.url} - 状态码 {response.status_code}")# 使用事件钩子# 在创建客户端时注册钩子with httpx.Client(event_hooks={'request': [log_request], 'response': [log_response]}) as client: response = client.get('https://httpbin.org/get')
7. HTTP/2 支持
# 启用 HTTP/2with httpx.Client(http2=True) as client: response = client.get('https://httpbin.org/get')print(f"HTTP Version: {response.http_version}") # 输出: HTTP/2
8. 自定义传输层
import httpximport ssl# 自定义 SSL 配置ssl_context = ssl.create_default_context()ssl_context.check_hostname = Falsessl_context.verify_mode = ssl.CERT_NONE# 自定义传输配置transport = httpx.HTTPTransport( retries=3, verify=ssl_context # 禁用 SSL 验证(仅测试环境使用))with httpx.Client(transport=transport) as client: response = client.get('https://httpbin.org/get')