How to write unit tests using pytest?

Unit test is all about increasing confidence in the code - one good test is better than 10 bad ones.

It's been a while since I am working on writing unit test cases, I will be sharing some of my experiences and challenges about the same, hopefully, it will help someone somewhere on the planet or maybe the future me.. :)

for starters what exactly is unit testing and why exactly is that needed?

A unit test is a way of testing a unit - the smallest piece of code that can be logically isolated in a system. Essentially, a unit test is a method that instantiates a small portion of our application and verifies its behavior independently from other parts.

setting up environments:

first, run the following commands to install pytest pandas which are required:

pip install pytest
pip install pandas

following is a simple folder structure:

image.png

heading towards the practical things now, we will see how exactly we should write unit tests

we have simple data where the name and age of the person are available but for some reason, we need one new column to be created that will have their birth year so here is a function to achieve that...

extract_birth_year.py

import datetime

def get_birth_year(df):
    '''
    This function is used to calculate birth year and create a new column 
   called birth_year in dataframe.

    parameters:
    df: dataframe having Age column

    returns:
    dataframe with birth_year column.  
    '''
    year = datetime.date.today().year
    replace_boolean_values = [True,False]
    if 'Age' in df.columns:
        df['Age'] = df['Age'].replace(replace_boolean_values,0)
        df['birth_year'] = year - df['Age']
    else:
        raise NotImplementedError('unsupported dataframe')
    return df

You might be thinking why are we replacing the boolean values.... spot on you are right here's a scenario suppose if the boolean value is present in the age column it gives out the wrong birth_year how??? as boolean values are considered as 0 and 1 for False True respectively. so for the current year suppose 2022 it will do like 2022 - 1 = 2021 or 2022 -0 = 2022 so those results will be wrong in that case it will be replaced by 0 so the birth year will be a current year and we can create next steps around that.

Now, we are going to write test cases for the above function. we will mainly learn the framework to write test cases as well as how exceptions are tested using pytest which is one of the widely used python libraries.

Things we need to consider:

  1. We are writing code to check if the behavior of the function is working as expected or not

  2. Is our function capable to provide user-friendly error messages so that they can be easy to understand or debug

  3. last but not least test cases are also one of the key elements to understanding the functionality of the code apart from usual docstrings and type hints.

the first part of starting to write a test case is creating a test.py file, in this case, our function is located in a file called extract_birth_year.py.

so by convention, we will create a new python file named test_extract_birth_year.py

inside the python file we just created, we will import our function which is present in extract_birth_year.py please keep both of the two files in the same folder as of now, we will discuss ways to call functions from other folders I'll add a link for that here.

from extract_birth_year import get_birth_year

This will import the intended function, let's start with the cases now,

There is a simple framework that is recommended by pytest documentation which says use following four step framework to write your test cases.

  1. Arrange

  2. Act

  3. Assert

  4. Cleanup

You can explore more about it here

import pandas as pd
from extract_birth_year import get_birth_year

def test_get_birth_year():
    # Arrange
    expected_data = [['tom', 10, 2012], ['nick', 15, 2007], ['juli', 14, 2008]]
    df_expected_output = pd.DataFrame(expected_data, columns=['Name', 'Age','birth_year'])
    # Act
    data = [['tom', 10], ['nick', 15], ['juli', 14]]
    df_actual = pd.DataFrame(data, columns=['Name', 'Age'])
    df_actual_output = get_birth_year(df_actual)
    # Assert
    assert df_actual_output.equals(df_expected_output)

for running a test cases you will have to use following command:

pytest test_extract_birth_year.py

In Arrange I have kept expected values from the function output, Act is more related to calling a dedicated function and saving the output in a variable, and in the end, assert will check if actual values and expected values are matching or not.

here we are using .equals which is one of the powerful ways provided by pandas that will check if two dataframes are identical or not

if you run the above test you will somewhat like this screen:

image.png

in green means, it says your test is passed!!!

image.png

if everything is red that means the test failed

if you just add -v to the command like this:

pytest test_extract_birth_year.py -v

it will show the summary kinda like this:

image.png

if you want to run any specific test there is a way for that too.

pytest test_extract_birth_year.py::test_get_birth_year

This will select only specified test functions and execute in this case output will be somewhat like this

image.png

Cool, enough with the pytest commands let's get back to our function and test cases..

Now we have the second part to the function as well, which is if provided dataframe is not the correct one then our function should raise an exception.. let's write it for that now...

Things we need to think of before writing a function is like we need to exactly check that it is raising the exception called NotImplementedError, we cannot check if just the general Exception is raised because there are possibilities that our age column may have some string data, in that case, it would throw some errors like

TypeError: unsupported operand type(s) for -: 'int' and 'str'

which is different than our use case if we add an Exception in the test it will pass the test but in fact, it should get noticed.

cooming back to exceptions testing now, there is a pretty useful syntax provided by pytest to actually check whether is expected exception is occurred or not

test_extract_birth_year.py

import pandas as pd
import pytest
from extract_birth_year import get_birth_year

def test_get_birth_year_unsupported_excpetion():
    # Arrange & Act
    data = [['krish', 1], ['jack', 50], ['elon', 100]]
    df_input = pd.DataFrame(data, columns=['Name', 'Amount'])
    with pytest.raises(NotImplementedError) as exc_info:
        df_actual_output = get_birth_year(df_input)
    # Assert
    assert exc_info.type is NotImplementedError
    assert exc_info.value.args[0] == "unsupported dataframe"

here you will observe that Act and Arrange is clubbed together, we can club this together considering the readability of the code.

Finally, in one edge case if the Boolean value is present in the Age column then it should return the birth year as the current year.

test_extract_birth_year.py

import datetime
import pandas as pd
import pytest
from extract_birth_year import get_birth_year


def test_get_birth_year_unclean_data():
    # Arrange
    year = datetime.date.today().year
    expected_data = [['tom', 0, year], ['nick', 15, 2007], ['juli', 14, 2008]]
    df_expected_output = pd.DataFrame(expected_data, columns=['Name', 'Age','birth_year'])
    # Act
    data = [['tom', True], ['nick', 15], ['juli', 14]]
    df_actual = pd.DataFrame(data, columns=['Name', 'Age'])
    df_actual_output = get_birth_year(df_actual)
    #Assert
    assert df_actual_output.equals(df_expected_output)

Conclusion: we learned about how to write test cases for

  1. Checking the behavior of the function when the data provided is correct.

  2. Behavior of function when data provided is not cleaned.

  3. Is exception handling working as intended when the data provided is unsupported?

Git-repo link to access the folder.

Cheers till the next one!!!

feel free to contact Happy Learning :)

Did you find this article valuable?

Support Shreyas Kulkarni by becoming a sponsor. Any amount is appreciated!