There are a huge number of HTTP clients available for Python - a quick search for Python HTTP Clients on Github returns over 1700 results(!) How do you make sense of all of them and find one which is right for your particular use case?
Do you have a single machine at your disposal, or a collection of them? Do you want to keep things simple or is raw performance more of a concern? A web application needing to make the odd request to a micro-service api is going to have quite different requirements to a script constantly scraping data. Additionally, there's the concern wether the library you choose will still be around 6 months down the line.
In this article we're going to cover five of the best HTTP clients currently available for Python and detail why each of them might be one for you to consider.
Introduction
For all the examples here, I'll be making GET requests to the Star Wars API (swapi.dev), which returns data about the people, planets and data from the Star Wars Universe. You can see an example of a JSON response from it below:
{
"name": "Death Star",
"model": "DS-1 Orbital Battle Station",
"manufacturer": "Imperial Department of Military Research, Sienar Fleet Systems",
"cost_in_credits": "1000000000000",
...
}
// Now I know which manufacturer I won't be asking to make my own Death Star.
The POST request examples here are to httpbin.org which is a developer testing tool responding with the content of the request, you could also use requestbin.com if you prefer. We'll be sending the following JSON POST data about Obi Wan:
{
"name": "Obi-Wan Kenobi",
"height": "182",
"mass": "77",
"hair_color": "auburn, white",
"skin_color": "fair",
"eye_color": "blue-gray",
"birth_year": "57BBY",
"gender": "male"
}
The Basics
If you're familiar with Pythons standard library, you're probably already aware of the confusing history of urllib and urllib2 modules within it. urllib2 (the original module) was split into separate modules in Python 3, urllib.request and urllib.error.
For comparison purposes with the packages in the rest of this article, let's first take a look at how we'd make a request using nothing but the standard library.
All our examples that follow use Python 3
import json
import urllib.request
response = urllib.request.urlopen('https://swapi.dev/api/starships/9/')
text = response.read()
print(json.loads(text.decode('utf-8')))
Note how we've had to use the json module to convert this into json, as urllib.request returns a byte literal.
Our POST would look like this:
import json
from urllib import request, parse
data = {"name": "Obi-Wan Kenobi", ...}
encoded_data = json.dumps(data).encode()
req = request.Request('https://httpbin.org/post', data=encoded_data)
req.add_header('Content-Type', 'application/json')
response = request.urlopen(req)
text = response.read()
print(json.loads(text.decode('utf-8')))
We've also had to encode the data we want to send and set the header content type which we'd need to update if we were submitting form data for example.
You might be feeling this is clunky - "All I wanted was to get some data!" Well, this is how many other developers felt too given with a number of HTTP clients available as additional packages. In the rest of the article we'll take a look at 5 of the best choices available.
urllib3
urllib3 is a powerful, user-friendly HTTP client for Python. Much of the Python ecosystem already uses urllib3 and you should too. urllib3 brings many critical features that are missing from the Python standard librarie
The urllib3 package is rather confusingly not in the standard library, but a separate HTTP client package which builds upon urllib. It provides missing features such as connection pooling, TLS verification and thread safety. This ultimately results in better performance for applications making many calls like web scraping, as they will reuse connections to hosts rather than creating new ones.
It's actually a dependency of the HTTP client requests listed later in this article and gets over 150M downloads a month. In order to make a request using urllib3, we'd make a call with it like the following:
import urllib3
import json
http = urllib3.PoolManager()
r = http.request('GET', 'https://swapi.dev/api/starships/9/')
print(json.loads(r.data.decode('utf-8')))
As with the standard library, we've had to convert this to JSON ourselves as urllib3 leaves us to do things manually.
For any post requests, we'd also need to manually encode query parameters or JSON like so:
import json
import urllib3
data = {"name": "Obi-Wan Kenobi", ...}
http = urllib3.PoolManager()
encoded_data = json.dumps(data).encode('utf-8')
r = http.request(
'POST',
'https://httpbin.org/post',
body=encoded_data,
headers={'Content-Type': 'application/json'}
)
print(json.loads(r.data.decode('utf-8')))
The Poolmanager object in each example handles the connection pooling and thread safety with the request requiring supplying a HTTP verb as a string argument. This is an extra step that gives urllib3 many of its additional features. Pools will be cached so that subsequent requests to the same host will use the same http connection instance. This means in order to make many requests to the same host, we might want to up the maximum size of the number of HTTPConnections.
urllib3 also offers complex retry behavior. This is a really important consideration - we don't want our connection to timeout due to a random one-off overloaded server and then just give up. We'd like to try multiple times before we consider the data unavailable. You can read up on how to use these features within the urllib3 documentation if you're interested.
The downside of using urllib3's connection pooling is that it makes it difficult to work with cookies as it isn't a stateful client. We have to manually set these as a header value on the request rather than having direct support by urllib3, or use something like the http.cookies module to manage them for us. For example:
headers={'Cookie': 'foo=bar; hello=world'}
Given that so many other libraries depend on urllib3, it's likely it will exist for some time to come.
Requests
Requests is an elegant and simple HTTP library for Python, built for human beings.
The Requests package is highly favored within the Python community, garnering over 110M downloads a month according to PePy. It's also recommended as a "higher level HTTP client interface" on the main urllib.request documentation. Working with requests incredibly simple and as such the majority of developers in the Python community use it as their HTTP client of choice. It's maintained by the Python Software Foundation with over 45k stars on Github and a dependency of many other python libraries such as gRPC and pandas.
Let's review how we'd make our requests with Requests(!):
import requests
r = requests.get('https://swapi.dev/api/starships/9/')
print(r.json())
Similarly, posting data is also made simple - we just need to change our get method call to a post:
import requests
data = {"name": "Obi-Wan Kenobi", ...}
r = requests.post('https://httpbin.org/post', json=data)
print(r.json())
Here you can see why requests is so popular - its design is just so elegant! The example here is the most concise requiring the least code of all the examples given so far. Requests incorporates HTTP verbs as methods (GET, POST) and we've even been able to convert straight to JSON without having to write our own decode method. As a developer this means it's dead simple to work with and understand, with only two method calls necessary to get the data we want from our API.
Within our POST, we've also not had to bother with encoding our data dictionary or worry about setting correct content type in the request headers. Request does all that for us. Thanks Requests!
It's also easy to modify our POST call to submit form data instead by simply replacing our 'json' argument with 'data'.
Another example of its simplicity is the way we can set cookies which are just an additional argument on the post method. For example:
r = requests.post('https://httpbin.org/post', data=data, cookies={'foo': 'bar', 'hello': 'world'}))
Requests also offers a whole host of other advanced features like sessions, request hooks and custom retry strategies. Sessions allow for statefulness with cookies being persisted across each request from a session instance, something urllib3 didn't provide. An example taken from the request documentation:
s = requests.Session()
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('https://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
Additionally, hooks allow you to register common behavior you want to execute after each call. You may be familiar with this concept if you use git, which allows you to the same. You can check out all the advanced features within the requests documentation.
Given all requests advanced features means its a solid choice for a variety of applications.
AIOHTTP
Asynchronous HTTP Client/Server for asyncio and Python.
AIOHTTP is package containing both a client and server framework, meaning it might be well suited to an API which also makes requests elsewhere. It has 11k stars on Github and a number of third party libraries build upon it. Making a GET request with it is as follows:
import aiohttp
import asyncio
async def main():
async with aiohttp.ClientSession() as session:
async with session.get('https://swapi.dev/api/starships/9/') as response:
print(await response.json())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
and our POST:
import aiohttp
import asyncio
data = {"name": "Obi-Wan Kenobi", ...}
async def main():
async with aiohttp.ClientSession() as session:
async with session.post('https://httpbin.org/post', json=data) as response:
print(await response.json())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
You can see that the aiohttp.ClientSession()
object uses similar syntax to requests - but the overall code is much longer than previous examples and now we have method calls using async
, await
along with an additional module import for asyncio
.
The AIOHTTP documentation gives a good overview of why all this extra code is necessary compared to say requests. It will take some time to understand the asynchronous programming concepts if you're not familiar with them, but what it ultimately means is it's possible to make a number of requests at the same time without waiting for each to return a response one after another. For situations where we only make a single request this might not be a concern, but if we need to make tens or even thousands of requests, all the time the CPU is waiting for a response could be better spent doing something else (like making another request!). We don't want to be paying for CPU cycles when we're just waiting around. As an example, let's take a look at some code looking up data for the first 50 starships from the Star Wars API.
import aiohttp
import asyncio
import time
async def get_starship(ship_id: int):
async with aiohttp.ClientSession() as session:
async with session.get(f'https://swapi.dev/api/starships/{ship_id}/') as response:
print(await response.json())
async def main():
tasks = []
for ship_id in range(1, 50):
tasks.append(get_starship(ship_id))
await asyncio.gather(*tasks)
asyncio.run(main())
This consistently took under 2s to run on my machine, whilst requesting the same data using a session with requests it takes just over 4s. So we're able to speed up the time it takes to retrieve our data if we can deal with the additional complexity it introduces to our code.
AIOHTTP offers thorough documentation along with a host of advanced features like sessions, cookies, pools, dns caching and client tracing. Where it falls down however is its lack of support for complex retry behavior which is only available via third party modules.
GRequests
GRequests brings Gevent - a 'coroutine -based Python networking library' to requests to allow requests to be made asynchronously. It's an older library, first released in 2012 which doesn't use Pythons standard asyncio
module. Individual requests can be made as we would do with requests, but we can also leverage the Gevent module to make a number of requests like so:
import grequests
reqs = []
for ship_id in range(0, 50):
reqs.append(grequests.get(f'https://swapi.dev/api/starships/{ship_id}/'))
for r in grequests.map(reqs):
print(r.json())
GRequests documentation is sparse and even goes as far to recommend other libraries over it on its Github page. At just 165 lines of code, it doesn't offer any advanced functionality over Requests itself. Over it's 9 years it's had a total of 6 releases, so is probably only really worth considering if you find async programming particularly confusing.
HTTPX
HTTPX is the newest package on this list (first released in 2015) and at the time of writing is still in beta, with a v1 expected sometime in 2021. It offers a "broadly requests-compatible API", is the only example here to offer HTTP2 support and also offers async APIs.
Using HTTPX is very similar to requests:
import httpx
r = httpx.get('https://swapi.dev/api/starships/9/')
print(r.json())
and for our POST:
import httpx
data = {"name": "Obi-Wan Kenobi", ...}
r = httpx.post('https://httpbin.org/post', json=data)
print(r.json())
We've simply changed the name of our module and again not needed to manage any JSON conversion. You'll also notice even though it has async apis, we can still write concise synchronous code with it. We can also create asynchronous versions of our examples, by using the http.AsyncClient
and using requests style http verb syntax against that. This allows us to get all the starship details we could ever desire in next to no time from our API. We update our previous get_starship
method from the aiohttp example to use httpx:
import httpx
import asyncio
async def get_starship(ship_id: int):
async with httpx.AsyncClient() as client:
r = await client.get(f'https://swapi.dev/api/starships/{ship_id}/')
print(r.json())
...
For requests which take some time to return a response, this again means our client does not have to wait around. It's definitely worth considering if you have a large number of requests you need to make simultaneously and want to save on CPU cycles. If you also are looking to refactor scripts based on requests to something asynchronous, then HTTPX would seem to be a good replacement.
Comparing Client Results
The benchmarks we've taken aren't exhaustive and serve only as a guide. Each of our examples use Pythons time
module to instrument a request call, execute it 100 times within a loop and return an average execution time. Where possible, we reuse clients and sessions for each package. For our asynchronous requests we measure the total execution time for 100 calls made asynchronously and return the average for each request. For the purposes of comparison and to negate server response time within the benchmarks, both GET and POST requests are made to a local httpbin.org instance. Download statistics for our comparison are taken from PePy which uses public PyPI download information.
Name | Downloads / mo | Github Stars | Sync GET | Sync POST | Async GET | Async POST |
---|---|---|---|---|---|---|
urllib | N/A | N/A | 5.79ms | 6.12ms | N/A | N/A |
urllib3 | 154M | 2.7k | 4.13ms | 4.39ms | N/A | N/A |
Requests | 115M | 45.5k | 6.94ms | 7.41ms | N/A | N/A |
Grequests | 0.37M | 3.8k | 7.11ms | 7.66ms | 4.53ms | 4.95ms |
AIOHTTP | 25M | 11.3k | 6.07ms | 6.61ms | 3.58ms | 3.92ms |
HTTPX | 4M | 7k | 5.43ms | 5.72ms | 4.01ms | 4.34ms |
We can see that in terms of pure performance for individual requests, urllib3 is the winner - with POSTs taking less time than the other libraries. The difference in times of our GET requests is very minimal at just under 3ms between all 5 of the libraries. It's interesting to note that although urllib3 may have less Github stars (sometimes taken as a indication of popularity), it's actually downloaded the most (remember what I said about it being a dependency of Requests?). Requests is clearly the library with the most community and offers a raft of advanced features we've not benchmarked in these simple tests.
In terms of async clients, AIOHTTP comes out on top, with the time per request for both GET and POSTs being the least. It also sports the most downloads and stars, but bear in mind this may well be because it also offers web framework behavior too. Due to the lack of activity on the GRequests project and it's own advice, you probably shouldn't consider it unless you have very specific needs.
Conclusion
We have seen throughout this article that Requests has inspired the design of many of the libraries shown. It is incredibly popular within the Python community, with it being the default choice for most developers. With the additional features it offers like sessions and simple retry behavior, you should likely be looking at it if you have simple needs or want to maintain simple code.
For those developers with more complex requirements and wanting to make more requests - AIOHTTP would currently be the best choice. Out of all our tests it performed the best asynchronously, it has the largest number of downloads/stars and currently offers a stable release. It is however complex with no retry support out of the box so maybe worth looking to HTTPX when a v1 release is available or if it being in beta is not a concern.
There are other Python HTTP clients that we didn't cover in this article. For example, there is a Python binding for cURL that we just covered in this article: How to use Python with cURL?.
Whatever your client needs, there's a Python package out there for you!