How to Make Your Logs Less Noisy in Python
Mar 13, 2024
The logging
module in the Python standard library has an extremely flexible
design. You can use it to configure almost any conceivable logging scheme. If you wanted every log message from your
application read aloud to you, Python’s logging
module can do
that!
The downside of this flexibility is that it can be hard to set things up when
all you want is to log messages to the console. The logging
module attempts
to solve this problem by providing a helper function called basicConfig()
.
This function allows you to set a log level for your application and a
destination for your log messages without having to understand the underlying
object hierarchy of the logging
module.
But basicConfig()
sets things up in a way that doesn’t work so well,
especially if your application has any third-party dependencies. Every time I
set up logging for a new project, all I can remember is that basicConfig()
is
not what I want, even if I’m just trying to log to the console. I then turn with a groan once again to the fairly involved
documentation for the logging
module to re-learn how to get it right.
So, mostly for the benefit of future me, here’s what I’ve found to be the best
way to configure logging in Python while avoiding the problems that come with
relying on basicConfig()
.
Noisy Logs Are The Default
Most people probably first reach for basicConfig()
because they want to set
the log level of their application to INFO
. The problem with basicConfig()
is that it sets a log level on something known as the root logger. You tell
basicConfig()
that you want to set a log level of INFO
, and it says, “Sure,
how about I make everything log at the INFO
level!” The root logger sits at
the root of the logger tree (more on this in a minute), so, in the absence of
any other configuration, its log level determines the log level of every logger
that exists in your application.
If you are writing a script or something with no third-party dependencies, this
is fine. If you are building an application in Python with lots of third-party
dependencies, you’ve just set the log level for all those dependencies too,
so your logs are likely to become cluttered with the log output of the libraries
you depend on. In particular, I find that the DEBUG
log level becomes
unusable—some third-party libraries might not log too many messages at the
INFO
level, but most are very chatty at the DEBUG
log level. If you want to
emit DEBUG
log messages from your own code, good luck spotting them in the
wall of DEBUG
log messages coming from code you didn’t write.
Of course, you could use grep
or a similar tool to filter the log output down
to only the messages you care about. But when I’m developing an application I
assume my dependencies work. I don’t care about the DEBUG
messages emitted by
a third-party library unless I’m actively trying to track down a problem with
that library. Why gum up the log output from my application with messages I
don’t need?
When I’m setting up logging for my application and choosing a log level, what
I’m really trying to do is set a log level for log messages from my own code.
If I’m debugging an issue and want to turn on more logging, I want to set the
log level to DEBUG
and see the extra DEBUG
log messages emitted by my
own application. It might also be useful to control the log level of specific
third-party libraries—but that’s a separate slider that I’ll touch only rarely.
Separate Control of Log Levels
I like this image of a slider controlling a log level. Ideally, we’d have multiple sliders—a control panel of sliders—to control the log levels for different parts of our application. Minimally, we’d like one slider for our application code and another for third-party code, or perhaps a slider for our application code and one separate slider for each third-party library we depend on. If we’re working at a company or organization with internal tools, then maybe we’d also want an additional slider for first-party dependencies.
How can we set up this control panel? There are just two crucial steps.
The first thing we have to do is follow best practices for naming our loggers.
I mentioned before that under the hood the Python logging
module organizes
“loggers” into a tree. The logging
module is designed to be used this way but
it doesn’t happen automatically. If you only ever log messages by calling
logging.info()
(i.e. by calling the info()
method on the logging
module)
then implicitly what you’re doing is only ever making use of the root logger.
You can and should create other loggers with different names. You can do this
using the getLogger()
function in the logging
module and passing __name__
(a special variable containing the name of the current module) as the first
argument. A logger object is just an interface into the whole logging system
that allows you to track where log messages came from. The convention
recommended by the standard library
documentation
is to create one logger for each module in your application using the name of
the module as the logger name. That way it’s easy to trace each log message
back to the module that emitted it.
If you’ve organized your code into packages, then your module names will look
something like toplevelpackage.nextpackage.bar
. If you create a logger with
this name, then the Python logging
module turns the dot-delimited name into a
branch of loggers to add to the logger tree. It will create a logger called
toplevelpackage
that sits just below the root logger, which is a parent to
the nextpackage
logger, which in turn is a parent to the bar
logger.
This tree structure is useful for a few different reasons. What’s most important for us though is that the logging level you set on a parent logger is inherited by all of its descendant loggers (provided the descendant loggers don’t have their own log levels set). If we follow the recommended naming conventions, we then have a way to set log levels for parts of our application module by module or even package by package.
The second thing we have to do is this: When our application starts up, and we
want to say, set the log level of our application based on what has been passed
to a --loglevel
command-line flag (a very common pattern), we should set the
log level of the logger highest in the tree that is not the root logger. If
we are developing a package, this should be the name of the top-level package
in our application. If there is no top-level package because we didn’t organize
our code that way, we can still explicitly create a top-level logger for our
application (perhaps using the name of the application as the logger name) and
ensure our other loggers are children of it.
Any third-party libraries we depend on might create their own loggers, but
these will be children of the root logger and not children of our own
application-specific logger. So when we set a log level on the top-level logger
in our application but leave the root logger alone, then we don’t have to worry
about enabling the DEBUG
log level, since it’s only our own application code
that we’d be affecting!
I’m glossing over some details here. Let’s take a look at how we might implement this and I’ll explain the parts I’ve skipped over so far.
|
|
The above code block contains a single function that we can call to set the log
level of our application on startup. I’ve chosen to format my log messages here
in a way that I like, but otherwise this code configures the root logger in a
manner very similar to how the root logger would be configured by default. We
attach a StreamHandler
to the root logger, which means all of our log
messages from throughout our application (including third-party libraries) will
get logged to the stderr
stream. We set the root logger to have a log level
of WARN
, which means that we will only hear from our third-party libraries if
something bad happens. This is also the default in Python if you don’t
explicitly configure logging.
The important part of the configure_logging()
function is that it takes the
caller-supplied log_level
and sets it on the top-level application logger.
This implementation assumes that the module containing configure_logging()
resides in the top-level package of your application, so it uses the special
variable __package__
as the top-level application logger name. If you’ve
named your other loggers according to the recommended convention, then this
function has set you up so that only your own logger objects are logging at the
given log_level
.
If you later decide you want to see those intrusive third-party logs, all you need to do is lower the log level set on the root logger. Alternatively, you could set a lower log level on the logger for just the particular third-party lib you need log messages from.
As you can see, we’ve had to dip our toes into the logging
object hierarchy
here. But the effort gives us a configuration miles better than basicConfig()
.
First-Party Dependencies
I mentioned before that we might also want to have a slider for first-party
dependencies. Maybe you find yourself needing to debug the behavior of your
colleague’s code more often than you find yourself debugging your third-party
dependencies. In that case, it would be nice to to be able to enable the
DEBUG
log level for that first-party code without affecting the log level of
your third-party dependencies.
One way to accomplish that could be to wrap getLogger()
with a function like
this:
|
|
If you use this function to create loggers across your organization, then all
you need to do is set the log level of the ORG_NAME
logger, which will be a
child of the root logger, to whatever you desire. Only your first-party code
will be affected; third-party code will create and use loggers that are not
children of the ORG_NAME
logger.
Having good logging is critical to understanding what your application is doing as it runs. It might not seem like a big deal to have noisy third-party log messages in your logs, because, after all, that’s just more information than you need, right? The information you do need is still there too. But I’ve found that having to sift through lots of stuff you don’t care about to get to the stuff you do makes logs harder to read and less likely to be relied upon. So do yourself a favor and Marie Kondo those unnecessary third-party log messages out of existence. It will make understanding and debugging your application easier.