Why you should use logging instead of print statements?

It is a common but wrong practice to add print statements inside your code to convey message about the code/function on the standard output - at least that is what I used to do till last year. Last year, I got to know about logging and got to understand its benefits when we build platforms or work on larger projects. And in this post, we will discuss about logging. Let us begin with understanding a little about logging.

What is logging?

Logging as the name suggests is a way to display useful messages to the user of your code about things that would add more context to the user in understanding what is happening in the code. An example of a useful logging could be this:

Imagine a scenario where you have code for model training in your project. A good logging practice is just before your model training you add a log message in your code that says - 'Beginning model training; Shape of input data: 10000, 50'. This extra logging code is really useful for the user who is running the project. And likewise once the model training is completed, another log message like 'Finished model training successfully; Time taken: 5 minutes' is super helpful.

Alright, we understand that logging is useful. The next question is how do we add these logging? The easiest and wrong way of doing this is adding a print() statement with the message you want to display. We won't be doing it via print(), we will use something called in-built 'logging' module in Python. There are packages for other languages as well and the basic premise of how they work is very similar.

Let us first try to understand the different levels of logging and then we will continue to understand why logging should be preferred over print statements.

Logging in Python

We will use an inbuilt Python module called logging to do our logging in Python. Since logging module is built in, we do not need to install anything else to get started with logging in Python. Before we proceed with writing log messages using logging, let us try to understand a basic thing about logging - the different log levels.

Different levels in logging

When you write log messages there are different use cases and based on this we classify our log messages into various levels. The most common form of log messages that I've come across is the one where we just want to add an informational message that a process has started or a process has run successfully. This form of log messages are informational and hence come under INFO log level. Likewise there are other log levels - debug, info, warning, error, critical. Depending on what type of message and severity you want to log, you can use these log levels while logging the message.

The logging level is setup depending on the severity of the message you want to display. There is an inherent order of the severity in the logging level. The default logging level is WARNING. Below are the logging levels in increasing order of severity and a rough template of when to use each level.

DEBUG: Set this as your logging level if you want to debug your program. All the levels above this are activated if you set DEBUG as your logging level i.e. messages in INFO, WARNING, ERROR, CRITICAL all will be logged out
INFO: Set this as your logging level when you want to display normal informational logs about the completion of some tasks/processes in your program.
WARNING: Set this as your logging level if you are not interested in DEBUG, INFO logs and is only concerned about WARNING, ERROR, CRITICAL logs. Messages under WARNING logs just say that there are some warnings that will not stop your code from execution but in future may lead to errors so make sure to check these warnings and understand what they are saying
ERROR: Set this as your log level if you are only interested in messages that raise ERROR, exceptions, and are CRITICAL.
CRITICAL: Messages under this level are critical and you want to put in messages here that are invoked when there is something critical with the program and the program won't process further.

Where to log?

Now that we have an understanding of different log levels and their potential use-case, we will now look at where can we put this log messages. You can direct the log messages to:

standard output i.e. the messages will be displayed on the screen of the user
text files so that they can be used later for RCA (Root Cause Analysis) I've mostly logged messages to screen but as the project becomes gigantic we should direct the logs to some file system to be used later for analysis.

How to set up logging in Python?

To illustrate this, I will use the below code snippet

import logging

# logging level set to INFO
logging.basicConfig(format='%(message)s',
                    level=logging.INFO)

LOG = logging.getLogger(__name__)

Note that the logger is assigned to LOG so that we can use this to log various log level messages. You can setup a basing config for your logging messages that takes inputs like:

format: the format in which you want your log messages; you can add date as well here

level: the logging level you want to set; if set to INFO, DEBUG and WARNING messages won't be logged even if you have added these log messages in your program

def run(self):
    with self.input().open('r') as infile:
        company_stats_embeddings = pd.read_csv(infile)
        LOG.info('--- Successfully loaded merged company stats and embeddings data ---')
        LOG.info(f'--- No. of companies in data: {company_stats_embeddings.shape[0]} ---')

    df_normalized = self.scale_df(company_stats_embeddings)

    LOG.info('--- Starting k-means clustering ---')
    kmeans_model = KMeansClustering(n_clusters=self.n_clusters)
    kmeans_model.fit(df_normalized)
    kmeans_cluster = kmeans_model.predict(df_normalized)

    company_cluster = company_stats_embeddings.copy()
    company_cluster['cluster'] = kmeans_cluster

    LOG.info('--- Distribution of clusters ---')
    LOG.info(f'--- \n{company_cluster.cluster.value_counts()} \n ---')

    LOG.info('--- Dumping generated clusters ---')
    with self.output().open('w') as outfile:
        company_cluster.to_csv(outfile, index=False)

As you can see, I've added sufficient log messages so that it is easier for the user to understand what is happening in the code while running it.

def scrape_data(companies_list, type_data='stand_alone'):
    final_basic_stats_list = []
    for company in companies_list:
        if type_data == 'stand_alone':
            url = f'https://www.screener.in/company/{company}'
        elif type_data == 'consolidated':
            url = f'https://www.screener.in/company/{company}/{type_data}'
        try:

            response = rq.get(url)
            soup = bs(response.text, "html.parser")  # parse the html page
            basic_features_soup = soup.find_all(class_='row-full-width')
            basic_features_list = basic_features_soup[0].find_all(class_='four columns')
            basic_stats = [f.get_text() for f in basic_features_list]

            basic_stats = [f.lower().strip().replace('\n', '').replace('  ', '').replace(' ', '_') for f in basic_stats]
            company_stats_dict = dict()
            company_stats_dict['symbol'] = company
            for f in basic_stats:
                s = f.split(":")
                if len(s) == 2:
                    company_stats_dict[s[0]] = s[1]
            final_basic_stats_list.append(list(company_stats_dict.values()))
        except IndexError:
            LOG.exception(f'--- Error in scraping {company} company data. Continue to scrape. ---')
            pass

    company_stats_df = pd.DataFrame(final_basic_stats_list,
                                    columns=company_stats_dict.keys())
    change_col_names = {'stock_p/e': 'stock_pe',
                        'sales_growth_(3yrs)': 'sales_growth_3yrs'
                        }
    company_stats_df.rename(change_col_names, axis=1, inplace=True)

    return company_stats_df

In the above snippet, I've added an exception log when the code raises an IndexError and I've also added the company because of which the exception was raised.

I hope this helps in understanding how to go about logging in Python. I've just scratched the surface of the topic but this should get you comfortably started with using logging in Python. If you want to learn more about logging in Python, the official Python resource is a great place to check out. Before I wrap up this post, I will end this post with a final reminder of why logging should be preferred over print statement.

Why to use logging over print statement?

Logging has different levels of severity that allows you to display log messages according to the level you want. A print statement does not give you that flexibility
Logging allows you to direct the log messages to separate files that can then be used for post analysis while the same is not easily available with print statement
You can set different log levels at individual code file level as well - some files may have INFO level while some may have DEBUG level

And that is it for now on this post. I will keep visiting this page for more additions as and when I learn more about logging.

Did you find the article useful? If you did, share your thoughts in the comments. Share this post with people who you think would enjoy reading this. Let's talk more of data science.

Advertiser Disclosure: This post contains affiliate links, which means I receive a commission if you make a purchase using this link. Your purchase helps support my work.

Manish Barnwal

...just another human