The Python Requests Module
Introduction
Dealing with HTTP requests is not an easy task in any programming language. If we talk about Python, it comes with two built-in modules, urllib
and urllib2
, to handle HTTP related operation. Both modules come with a different set of functionalities and many times they need to be used together. The main drawback of using urllib
is that it is confusing (few methods are available in both urllib
, urllib2
), the documentation is not clear and we need to write a lot of code to make even a simple HTTP request.
To make these things simpler, one easy-to-use third-party library, known as Requests, is available and most developers prefer to use it instead or urllib
/urllib2
. It is an Apache2 licensed HTTP library powered by urllib3 and httplib
.
Installing the Requests Module
Installing this package, like most other Python packages, is pretty straight-forward. You can either download the Requests source code from Github and install it or use pip:
$ pip install requests
For more information regarding the installation process, refer to the official documentation.
To verify the installation, you can try to import it like below:
import requests
If you don’t receive any errors importing the module, then it was successful.
Making a GET Request
GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. Let me start with a simple example first. Suppose we want to fetch the content of the home page of our website and print out the resultin HTML data. Using the Requests module, we can do it like below:
import requests
r = requests.get('https://api.github.com/events')
print(r.content)
It will print the response in an encoded form. If you want to see the actual text result of the HTML page, you can read the .text
property of this object. Similarly, the status_code
property prints the current status code of the URL:
import requests
r = requests.get('https://api.github.com/events')
print(r.text)
print(r.status_code)
requests
will decode the raw content and show you the result. If you want to check what type of encoding
is used by requests
, you can print out this value by calling .encoding
. Even the type of encoding can be changed by changing its value. Now isn’t that simple?
Reading the Response
The response of an HTTP request can contain many headers that holds different information.
httpbin is a popular website to test different HTTP operation. In this article, we will use httpbin/get to analyse the response to a GET request. First of all, we need to find out the response header and how it looks. You can use any modern web-browser to find it, but for this example, we will use Google’s Chrome browser.
- In Chrome, open the URL http://httpbin.org/get, right click anywhere on the page, and select the “Inspect” option
- This will open a new window within your browser. Refresh the page and click on the “Network” tab.
- This “Network” tab will show you all different types of network requests made by the browser. Click on the “get” request in the “Name” column and select the “Headers” tab on right.
The content of the “Response Headers” is our required element. You can see the key-value pairs holding various information about the resource and request. Let’s try to parse these values using the requests
library:
import requests
r = requests.get('http://httpbin.org/get')
print(r.headers['Access-Control-Allow-Credentials'])
print(r.headers['Access-Control-Allow-Origin'])
print(r.headers['CONNECTION'])
print(r.headers['content-length'])
print(r.headers['Content-Type'])
print(r.headers['Date'])
print(r.headers['server'])
print(r.headers['via'])
We retrieved the header information using r.headers
and we can access each header value using specific keys. Note that the key is not case-sensitive.
Similarly, let’s try to access the response value. The above header shows that the response is in JSON format: (Content-type: application/json)
. The Requests library comes with one built-in JSON parser and we can use requests.get('url').json()
to parse it as a JSON object. Then the value for each key of the response results can be parsed easily like below:
import requests
r = requests.get('http://httpbin.org/get')
response = r.json()
print(r.json())
print(response['args'])
print(response['headers'])
print(response['headers']['Accept'])
print(response['headers']['Accept-Encoding'])
print(response['headers']['Connection'])
print(response['headers']['Host'])
print(response['headers']['User-Agent'])
print(response['origin'])
print(response['url'])
The above code will print the below output:
{'headers': {'Host': 'httpbin.org', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Accept': '*/*', 'User-Agent': 'python-requests/2.9.1'}, 'url': 'http://httpbin.org/get', 'args': {}, 'origin': '103.9.74.222'}
{}
{'Host': 'httpbin.org', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Accept': '*/*', 'User-Agent': 'python-requests/2.9.1'}
*/*
gzip, deflate
close
httpbin.org
python-requests/2.9.1
103.9.74.222
http://httpbin.org/get
Third line, i.e. r.json()
, printed the JSON value of the response. We have stored the JSON value in the variable response
and then printed out the value for each key. Note that unlike the previous example, the key-value is case sensitive.
Similar to JSON and text content, we can use requests
to read the response content in bytes for non-text requests using the .content
property. This will automatically decode gzip
and deflate
encoded files.
Passing Parameters in GET
In some cases, you’ll need to pass parameters along with your GET requests, which take the form of query strings. To do this, we need to pass these values in the params
parameter, as shown below:
import requests
payload = {'user_name': 'admin', 'password': 'password'}
r = requests.get('http://httpbin.org/get', params=payload)
print(r.url)
print(r.text)
Here, we are assigning our parameter values to the payload
variable, and then to the GET request via params
. The above code will return the following output:
http://httpbin.org/get?password=password&user_name=admin
{"args":{"password":"password","user_name":"admin"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Host":"httpbin.org","User-Agent":"python-requests/2.9.1"},"origin":"103.9.74.222","url":"http://httpbin.org/get?password=password&user_name=admin"}
As you can see, the Reqeusts library automatically turned our dictionary of parameters to a query string and attached it to the URL.
Note that you need to be careful what kind of data you pass via GET requests since the payload is visible in the URL, as you can see in the output above.
Making POST Requests
HTTP POST requests are opposite of the GET requests as it is meant for sending data to a server as opposed to retrieving it. Although, POST requests can also receive data within the response, just like GET requests.
Instead of using the get()
method, we need to use the post()
method. For passing an argument, we can pass it inside the data
parameter:
import requests
payload = {'user_name': 'admin', 'password': 'password'}
r = requests.post("http://httpbin.org/post", data=payload)
print(r.url)
print(r.text)
Output:
http://httpbin.org/post
{"args":{},"data":"","files":{},"form":{"password":"password","user_name":"admin"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Content-Length":"33","Content-Type":"application/x-www-form-urlencoded","Host":"httpbin.org","User-Agent":"python-requests/2.9.1"},"json":null,"origin":"103.9.74.222","url":"http://httpbin.org/post"}
The data will be “form-encoded” by default. You can also pass more complicated header requests like a tuple if multiple values have same key, a string instead of a dictionary, or a multipart encoded file.
Sending Files with POST
Sometimes we need to send one or more files simultaneously to the server. For example, if a user is submitting a form and the form includes different form-fields for uploading files, like user profile picture, user resume, etc. Requests can handle multiple files on a single request. This can be achieved by putting the files to a list of tuples, like below:
import requests
url = 'http://httpbin.org/post'
file_list = [
('image', ('image1.jpg', open('image1.jpg', 'rb'), 'image/png')),
('image', ('image2.jpg', open('image2.jpg', 'rb'), 'image/png'))
]
r = requests.post(url, files=file_list)
print(r.text)
The tuples containing the files’ information are in the form (field_name, file_info)
.
Other HTTP Request Types
Similar to GET and POST, we can perform other HTTP requests like PUT, DELETE, HEAD, and OPTIONS using the requests
library, like below:
import requests
requests.put('url', data={'key': 'value'})
requests.delete('url')
requests.head('url')
requests.options('url')
Handling Redirections
Redirection in HTTP means forwarding the network request to a different URL. For example, if we make a request to “http://www.github.com“, it will redirect to “https://github.com” using a 301 redirect.
import requests
r = requests.post("http://www.github.com")
print(r.url)
print(r.history)
print(r.status_code)
Output:
https://github.com/
[, ]
200
As you can see the redirection process is automatically handled by requests
, so you don’t need to deal with it yourself. The history
property contains the list of all response objects created to complete the redirection. In our example, two Response
objects were created with the 301 response code. HTTP 301 and 302 responses are used for permanent and temporary redirection, respectively.
If you don’t want the Requests library to automatically follow redirects, then you can disable it by passing the allow_redirects=False
parameter along with the request.
Handling Timeouts
Another important configuration is telling our library how to handle timeouts, or requests that take too long to return. We can configure requests
to stop waiting for a network requests using the timeout
parameter. By default, requests
will not timeout. So, if we don’t configure this property, our program may hang indefinitely, which is not the functionality you’d want in a process that keeps a user waiting.
import requests
requests.get('http://www.google.com', timeout=1)
Here, an exception will be thrown if the server will not respond back within 1 second (which is still aggressive for a real-world application). To get this to fail more often (for the sake of an example), you need to set the timeout limit to a much smaller value, like 0.001.
The timeout can be configured for both the “connect” and “read” operations of the request using a tuple, which allows you to specify both values separately:
import requests
requests.get('http://www.google.com', timeout=(5, 14))
Here, the “connect” timeout is 5 seconds and “read” timeout is 14 seconds. This will allow your request to fail much more quicklly if it can’t connect to the resource, and if it does connect then it will give it more time to download the data.
Cookies and Custom Headers
We have seen previously how to access headers using the headers
property. Similarly, we can access cookies from a response using the cookies
property.
For example, the below example shows how to access a cookie with name cookie_name
:
import requests
r = requests.get('http://www.examplesite.com')
r.cookies['cookie_name']
We can also send custom cookies to the server by providing a dictionary to the cookies
parameter in our GET request.
import requests
custom_cookie = {'cookie_name': 'cookie_value'}
r = requests.get('http://www.examplesite.com/cookies', cookies=custom_cookie)
Cookies can also be passed in a Cookie Jar object. This allows you to provide cookies for a different path.
import requests
jar = requests.cookies.RequestsCookieJar()
jar.set('cookie_one', 'one', domain='httpbin.org', path='/cookies')
jar.set('cookie_two', 'two', domain='httpbin.org', path='/other')
r = requests.get('https://httpbin.org/cookies', cookies=jar)
print(r.text)
Output:
{"cookies":{"cookie_one":"one"}}
Similarly, we can create custom headers by assigning a dictionary to the request header using the headers
parameter.
import requests
custom_header = {'user-agent': 'customUserAgent'}
r = requests.get('https://samplesite.org', headers=custom_header)
The Session Object
The session object is mainly used to persist certain parameters, like cookies, across different HTTP requests. A session object may use a single TCP connection for handling multiple network requests and responses, which results in performance improvement.
import requests
first_session = requests.Session()
second_session = requests.Session()
first_session.get('http://httpbin.org/cookies/set/cookieone/111')
r = first_session.get('http://httpbin.org/cookies')
print(r.text)
second_session.get('http://httpbin.org/cookies/set/cookietwo/222')
r = second_session.get('http://httpbin.org/cookies')
print(r.text)
r = first_session.get('http://httpbin.org/anything')
print(r.text)
Output:
{"cookies":{"cookieone":"111"}}
{"cookies":{"cookietwo":"222"}}
{"args":{},"data":"","files":{},"form":{},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Cookie":"cookieone=111","Host":"httpbin.org","User-Agent":"python-requests/2.9.1"},"json":null,"method":"GET","origin":"103.9.74.222","url":"http://httpbin.org/anything"}
The httpbin path /cookies/set/{name}/{value} will set a cookie with name
and value
. Here, we set different cookie values for both first_session
and second_session
objects. You can see that the same cookie is returned in all future network requests for a specific session.
Similarly, we can use the session object to persist certain parameters for all requests.
import requests
first_session = requests.Session()
first_session.cookies.update({'default_cookie': 'default'})
r = first_session.get('http://httpbin.org/cookies', cookies={'first-cookie': '111'})
print(r.text)
r = first_session.get('http://httpbin.org/cookies')
print(r.text)
Output:
{"cookies":{"default_cookie":"default","first-cookie":"111"}}
{"cookies":{"default_cookie":"default"}}
As you can see, the default_cookie
is sent with each requests of the session. If we add any extra parameter to the cookie
object, it appends to the default_cookie
. "first-cookie": "111"
is append to the default cookie "default_cookie": "default"
Using Proxies
The proxies
argument is used to configure a proxy server to use in your requests.
http = "http://10.10.1.10:1080"
https = "https://10.10.1.11:3128"
ftp = "ftp://10.10.1.10:8080"
proxy_dict = {
"http": http,
"https": https,
"ftp": ftp
}
r = requests.get('http://sampleurl.com', proxies=proxy_dict)
The requests
library also supports SOCKS proxies. This is an optional feature and it requires the requests[socks]
dependency to be installed before use. Like before, you can install it using pip:
$ pip install requests[socks]
After the installation, you can use it as shown here:
proxies = {
'http': 'socks5:user:[email protected]:port'
'https': 'socks5:user:[email protected]:port'
}
SSL Handling
We can also use the Requests library to verify the HTTPS certificate of a website by passing verify=true
with the request.
import requests
r = requests.get('https://www.github.com', verify=True)
This will throw an error if there is any problem with the SSL of the site. If you don’t want to verity, just pass False
instead of True
. This parameter is set to True
by default.
Downloading a File
For downloading a file using requests
, we can either download it by streaming the contens or directly downloading the entire thing. The stream
flag is used to indicate both behaviors.
As you probably guessed, if stream
is True
, then requests
will stream the content. If stream
is False
, all content will be downloaded to the memory bofore returning it to you.
For streaming content, we can iterate the content chunk by chunk using the iter_content
method or iterate line by line using iter_line
. Either way, it will download the file part by part.
For example:
import requests
r = requests.get('https://cdn.pixabay.com/photo/2018/07/05/02/50/sun-hat-3517443_1280.jpg', stream=True)
downloaded_file = open("sun-hat.jpg", "wb")
for chunk in r.iter_content(chunk_size=256):
if chunk:
downloaded_file.write(chunk)
The code above will download an image from Pixabay server and save it in a local file, sun-hat.jpg
.
We can also read raw data using the raw
property and stream=True
in the request.
import requests
r = requests.get("http://exampleurl.com", stream=True)
r.raw
For downloading or streaming content, iter_content()
is the prefered way.
Errors and Exceptions
requests
throws different types of exception and errors if there is ever a network problem. All exceptions are inherited from requests.exceptions.RequestException
class.
Here is a short description of the common erros you may run in to:
ConnectionError
exception is thrown in case ofDNS failure
,refused connection
or any other connection related issues.Timeout
is raised if a request times out.TooManyRedirects
is raised if a request exceeds the maximum number of predefined redirections.HTTPError
exception is raised for invalid HTTP responses.
For a more complete list and description of the exceptions you may run in to, check out the documentation.
Conclusion
In this tutorial I explained to you many of the features of the requests
library and the various ways to use it. You can use requests
library not only for interacting with a REST API, but it can be used equally as well for scraping data from a website or to download files from the web.
Modify and try the above examples and drop a comment below if you have any question regarding requests
.