Using Python to check Status codes.

Nnamdi
3 min readSep 19, 2019

In this tutorial, I will show you a simple python code that can help you check status codes of all the URLs on your website.

This code snippet is especially useful when you have a large site with lots of urls. You can check all the status codes in bulk and write them to a csv so you can further analyze them.

WHAT IS A STATUS CODE

A status code is the server response from the request made by a client to the web server. The default status code is 200 which means that a successful request was made by the client.

There are 5 different types of status codes:

1xx — This lets us know that the request was recieved

2xx- This shows that the request was successful

3xx — This is for redirects (temporary and permanent)

4xx — Client errors

5xx- Server errors

Let us examine the popular Http status codes closely

200 : This is the default status code and let’s us know that everything is fine with the page we requested for. It has been successfully rendered. This is the ideal situation for every page.

301: This is for permanent redirects. When you are moving from one domain to another this is especially useful to redirect your users to the new domain.

302: This is for temporary redirects. When you want to perform site maintenance or trying to redirect users to a temporary link, you will use 302.

404: This status code usually indicates that the page the client is requesting is no longer on the webserver. 404 status code errors are known as Client-side errors. A 404 status code can be caused by different reasons; the page has been removed or that the user typed in the URL incorrectly.

500: This refers to an internal server error.

GETTING STATUS CODES WITH PYTHON

Now that we know what status codes are we will now want to be able to check the status codes of all the webpages on our website.

The first step is to download all the URLs on your website as a csv or an excel file. If you are using WordPress there are some plugins that can help with this task. Alternatively, you can use search console. Go to Performance, select Pages and download the URLs as a csv.

Once that is done, we can proceed to write the python script that will get the status codes of the entire urls that were downloaded.

To get status code in python, we will be making use of the request library. The request library is a super useful library that helps us make http requests easily. You can read more from the official documentation

To get started with requests we will need to pip install it.

pip install requests

This should install the requests library for us to begin work with.

The next step is to download our CSV file into our code editor. I will be making use of pandas.

import pandas as pd
import requests
df= pd.read_csv('path to file/file_name.csv')

Now, we will loop through the column with the urls and check the status code of each url passed.

for url in df['urls']:   response= requests.get(url)   status= response.status_code   df['Status']=status

The last line of the code is for us to add the column ‘Status’ to our DataFrame. Once we have looped through each url and checked the status code. Let us save it as a CSV file.

df.to_csv('Status_codes.csv')

You can decide to analyze the status codes more in-depth. What if you want to know the pages that have broken links and are giving 404 errors?

for url in df['urls']:  response= requests.get(url)  status= response.status_code  df['Status']=status
if status == 404:
print(url)

This will output all the pages with 404 errors.

Thanks for reading this article on how to check status codes with python. If you enjoyed reading this, please clap and share the article!.

PS: You can follow me on twitter

--

--