The Documentation Dilemma

Every coder will come across the same aggravating issue one day, finding a script or library you think you need, and it has zero instructions. Or the examples don’t work, or there is so much writing and so little substance you can’t hope to dig through it all to discover how it’s meant to be used.

Don’t be like that project, document your code smartly. And document to the level of project you are writing. If you have a little script that’s self-contained as a single file, it would be stupid to add a documentation directory and set up Sphinx. Whereas if you were creating a library for other coders, it would be stupid if you didn’t have extensive documentation.

Acceptable Documentation

I used to be like every other college grad, “your code should document itself”, “An idiot could decipher my code”, “if they can’t bother reading through the code, why would they run it on their system?” That is the sound of inexperience and ignorance. For your code to be useful to other people, it needs at minimum to describe what it does and have expected input and outputs, which is easily shown in examples.

The reality is, ¬†professionals simply do a quick search (either company internal or web) and determine if a program or script already exists to do what they want, by reading a short blurb about them and moving on. No one stops to read the code, don’t do your project the disservice of hiding it from others that would find it useful. Not only does it help others, it’s a much better self promoter, showing that you care about your own work and want others to use it.

Here is a quick breakdown of what I view as requirements of documentation:

Single script

  • Readme: can be the docstring at top of file
    • Description of what the script is designed to do
    • Examples, including a sample run with common input / output
  • Code should have meaningful messages on failure

Multiple scripts

  • Readme: standalone file
    • Description of project as a whole
    • Description of each script
    • Example of each script
    • Description and examples of any interconnection between scripts
  • Code needs meaningful messages on failure


  • Readme: standalone file
    • Description of project
    • Installation guide
    • Basic examples
  • Changelog
  • License
  • Documentation directory
    • Stored in a human readable markup language (.md, .rst, .latex) that can be generated into PDF or HTML
    • Python projects should use Sphinx that pulls in docstrings
    • Index with contents of Readme, links to other files for each aspect of library
  • Code should use logging

Library with API

  • Everything in Library
  • API documents for every interface
    • preferably in a common format such as RAML


  • Readme: standalone file
    • Description of project
    • Installation guide
    • Tutorials
      • Including detailed examples / use cases
    • FAQ / troubleshooting
  • License
  • Code should use logging
  • Changelog

Obviously I can’t go into every possible scenario, but I hope this clears up what is a common expectation of a project. And if you intend for others to help with the project, it’s a good idea to include developer docs for contributors that go over code conventions and any particular styles you want to adhere too. If other people are helping, make sure to have a section dedicated thanks to them, such as an Authors file.

Writing the Documentation

Here’s a quick baseline of a README in Markdown you can feel free to copy and reuse.

Project Name


The is an example project that is designed 
to teach documentation. The project will search
imdb and find a movie based off the input given.

Supported Systems:

* Windows XP or newer
* Linux 
Tested on:

* Python 2.6+
* Python 3.3+
* Pypy


Find the movie

$ python 300 

Title: 300 (2006)
Summary: King Leonidas of Sparta and a
         force of 300 men fight the Persians at 
         Thermopylae in 480 B.C.
Rating: 7.7 / 10

Get more details about it

$ python --director 300 

Title: 300 (2006)
Director: Zack Snyder


This code is MIT Licensed, see LICENSE 

If you have a script, and it already has good error messages, you’re good to go! Feel free to add more example cases, FAQ or the full usage output (aka the details given when do you my_project -h, and yes you need that too.)

For larger projects, this is not enough. As it would be very difficult to fit enough examples onto a single readable page, and very easy for them to go stale. To avoid that issue in code libraries, I recommend including the examples of usage in the Python docstrings themselves, and have them auto included in the documentation with Sphinx. That way, when you update a function, you’re much more likely to remember to update the example beside it.

def now(utc=False, tz=None):
Get a current DateTime object. By default is local.

.. code:: python
# DateTime(2016, 12, 8, 22, 5, 2, 517000)"It's {24-hour}:{min}")
# "It's 22:05"

:param utc: bool, default False, UTC time not local
:param tz: TimeZone as specified by the datetime module
:return: reusables.DateTime

Here’s what that will look like when rendered with Sphinx (in this case hosted with Read the Docs ) documentation example

This is very convenient if you have a library and are exposing all your functions, but isn’t an option for full-fledged programs or APIs. I have seen a few fancy tricks, such as pulling the examples out of the Readme’s and using them as functional tests, or parsing the RAML API files and testing them. ¬†It’s a good way to ensure that all the examples are up to date, but now the documentation is dual purpose, and has to be both machine and human readable, possibly taking too much away from the human readability for the good that the tests do to make sure they are up to date.

So what’s the answer? Well, there isn’t just one. If it’s just you and you’re not going to be updating the project that often, write the examples when it is near completion and try your best to update them if the project updates. If it’s an ever evolving project with multiple contributors, it would probably be best to have some sort of automation to ensure they are not going stale.

The best thing to do is have a standard across your team / company. Even a stupid standard implemented well can result in better documentation than no standards. That way everyone knows exactly where to look for docs, and what they are expected to contribute.


Your first date with Python logging

Logging vs Print

The difference between logging and using print isn’t simply personal preference, rather choosing the proper tool per use case. Check out the Python docs when to use logging if you don’t believe me.

If you are simply using Python to write scripts with clear paths and failure on error, it’s perfectly reasonable to not have to use logging. However as soon as you have a long running program, a service, or a complicated script, it’s time to take the plunge.

The Logging Object

In every file you want to log with, you’ll be adding at minimum two lines:

import logging

logger = logging.getLogger("my_program")

This will grab the logger object associated with the name “my_program”. In very basic terms, think of the interpreter having an internal dictionary of loggers. It knows which one to give you back when to send it a key (name). This way, you don’t have to pass the it around yourself. So you should never share or pass logger objects, simply make another call to getLogger.

A very common mistake I see is projects having a single file with the logger object in it and importing it everywhere, don’t do it.

Let’s repeat that a little louder.

Never share or pass the logger object

logger = logging.getLogger("my_program")
from .log import logger
from .log import logger


It’s bad practice, prevents easy future modifications, and adds needless complication. You will always get the same logger back with the call the getLogger with the same name.

Therefore, if you are trying to get a logger for an imported project, don’t try to go searching for project.logger, simply make a call to getLogger("Project"). “But how do I know what they named their logger?” Most projects use the same convention, which I suggest you also follow:

import logging


The variable __name__ is a predefined global that houses the running module’s name as imported. So if you run the script directly, it will be "__main__". Whereas if you import the module ( it will be "my_program". If you import a submodule of called (creative, arn’t I?) it would be set to "my_program.submodule", just as you would see it as an import line.

“Is it ok to have dots in the name?” It’s better than ok, it’s pure awesome. Python logging has a built in dot hierarchical system, allowing you to specify levels and handlers to the non root logger. In other words, each module could be expanded to have it’s own level and handlers (covered later).

You can set the name manually as well, getLogger("my_program.submodule"), which can be very handy if you want to split logging not by module, but rather shared functionality, or have a different place to log too. Which is one of the main reasons against sharing the logger object. If later on you want to have everything in a type of submodule log to a file, and you were naughty and have shared logger object, you now have to update every file to use a new object. Whereas if you followed good practices and used  getLogger in each file, you would only have to add two lines of code to your main file to add a file handler.

One of the ‘gotchas’ with using the same logger across multiple modules is that you have to have the logger’s handlers and settings set up before you use it.

Sounds like a ‘duh’, but consider a case like:
import logging 
from subdir import submodule

logger = logging.getLogger("my_project") 
# Required handler code redacted for example

logger.warning("Speedboat crashed")

import logging
logger = logging.getLogger("my_project.submodule")

logger.debug("Submodule loaded")

def sub_func(var):"Got variable {}".format(var))
    return var + 1
No handlers could be found for logger "my_project.submodule"
2016-12-09 16:14:13,079 - my_project - WARNING - Somethings on fire
2016-12-09 16:21:13,387 - my_project.submodule - INFO - Got variable 5

You will get an error message of “No handlers could be found for logger “my_project.submodule””, because the parent logger was not yet set up when it was imported, but was by the time its function was run.


In the logging module, there are five different preset logging levels:

  • 10 – DEBUG
  • 20 – INFO
  • 30 – WARNING
  • 40 – ERROR
  • 50 – CRITICAL

When you set a logging level on a logger or handler, you are simply setting a number, and anything set to that number or above will be logged.


Is equivalent to:


Everyone uses the first one generally because it is easier to read, and you generally already have the logging module imported, however it is not required.

By default, the logger itself has an effective logging level of WARNING (30), so it will not display debug or info messages. (The built in handlers have a level of NOTSET (0), so they won’t stop stuff from writing, but it is good practice to set them.)

import logging

logger = logging.getLogger(__name__)
# Required handler code redacted for example

print(f"The logger is set at level: {logger.getEffectiveLevel()}")

logger.debug("Debug message")"Info message")
logger.warning("Something's on fire")
The logger is set at level: 30
2016-12-09 15:23:53,220 - __main__ - WARNING - Something's on fire

It is also possible to create your own level. This is rarely used, I have personally only seen it done on the extreme ends (i.e. 5 for low level debugging and 45 for near critical), so here is an example that doesn’t deal with either of those:

logging.addLevelName(ALMOST_SCARY, "ALMOST_SCARY")

logger.setLevel(ALMOST_SCARY)"Info message")
logger.log(ALMOST_SCARY, "Spider in the tub")
logger.warning("House set on fire")
2016-12-09 15:35:29,413 - __main__ - ALMOST_SCARY - Spider in the tub
2016-12-09 15:35:29,413 - __main__ - WARNING - House set on fire

I find it good practice to set the logger itself as low as possible, then set the level on the handler’s instead. So that way if you eventually want a debug file, you don’t have to change (i.e. debug when fails) multiple lines, just one.


Logging handlers are the brains behind the logger that tell it where to send the log messages. The logger object simply has a list of handlers logger.handlers (not to be confused with the available handlers under logging.handlers). When the logger receives a message to log, the logger will iterate through them, and if they have a logging level higher or equal to the one set on the incoming message, it will send it to the registered handler to ‘handle’ (Literal wording from the source code. Programmers are not an inventive lot).

The two most commonly used handlers are StreamHandler and FileHandler (which is a subclass of StreamHandler). However many come in the box, from SysLog to HTTP handlers, and is possible to make or modify one for your own needs.

Here is a basic example of creating a StreamHandler and assigning it to our logger:

import logging
import sys 

logger = logging.getLogger(__name__)

my_stream_handler = logging.StreamHandler(stream=sys.stdout)


logger.warning("Crazy ex is headed over, has a blow torch")
Crazy ex is headed over, has a blow torch

“Wait a minute, all the other examples had time, module and logging level, where’d they go?” That’s all set up with Formatters, which we set up next. For now, sticking to handlers.

StreamHandler is pretty basic, good to use for sys.stdout and sys.stderr. FileHandler is based on top of it and is very commonly used, and often incorrectly. Anytime you are deploying to a production server that you aren’t running logrotate or similar on, you will want to use one of the rotating file handlers, either RotatingFileHandler, based on file size, or TimedRotatingFileHandler.

import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger(__name__)

# If either maxBytes or backupCount is set to 0, it will not rotate 
my_file_handler = RotatingFileHandler("rotating_example.log",
                                      maxBytes=1024, backupCount=5) 


for a in list(range(1, 100))[::-1]: # Reveres to start at 99 and count down"{0} bottles of beer on a wall, take one down," 
                "pass it around, {1} bottles of beer on the wall".format(a, a-1))

This will create six different files (the current log file and past 5 backups), named:

9 bottles of beer on a wall, take one down, pass it around, 8 bottles of beer on the wall
8 bottles of beer on a wall, take one down, pass it around, 7 bottles of beer on the wall
7 bottles of beer on a wall, take one down, pass it around, 6 bottles of beer on the wall
6 bottles of beer on a wall, take one down, pass it around, 5 bottles of beer on the wall
5 bottles of beer on a wall, take one down, pass it around, 4 bottles of beer on the wall
4 bottles of beer on a wall, take one down, pass it around, 3 bottles of beer on the wall
3 bottles of beer on a wall, take one down, pass it around, 2 bottles of beer on the wall
2 bottles of beer on a wall, take one down, pass it around, 1 bottles of beer on the wall
1 bottles of beer on a wall, take one down, pass it around, 0 bottles of beer on the wall
54 bottles of beer on a wall, take one down, pass it around, 53 bottles of beer on the wall
53 bottles of beer on a wall, take one down, pass it around, 52 bottles of beer on the wall
52 bottles of beer on a wall, take one down, pass it around, 51 bottles of beer on the wall
51 bottles of beer on a wall, take one down, pass it around, 50 bottles of beer on the wall
50 bottles of beer on a wall, take one down, pass it around, 49 bottles of beer on the wall
49 bottles of beer on a wall, take one down, pass it around, 48 bottles of beer on the wall
48 bottles of beer on a wall, take one down, pass it around, 47 bottles of beer on the wall
47 bottles of beer on a wall, take one down, pass it around, 46 bottles of beer on the wall
46 bottles of beer on a wall, take one down, pass it around, 45 bottles of beer on the wall

Anything older than that is lost, which is a good thing when you are dealing more in the size of gigabyte log files after months or years of use.


The last critical element of logging is the Formatters. They are often overlooked, as people generally use a copy pasted ones from the python docs. But when customized they can be much more project personalized for better debugging. The default, as seen just above is simply:


Each handler has it’s own formatter that is best to be set before it’s added to the logger. Probably the most commonly used one (other than the default), is:

"%(asctime)s - %(name)s - %(levelname)s - %(message)s"

They are all kinda a mouthful to set up:

    logging.Formatter("%(asctime)s - %(name)s - "
                      "%(levelname)s - %(message)s"))
2016-12-09 18:50:56,604 - __main__ - INFO - Info message
2016-12-09 18:50:56,604 - __main__ - WARNING - House set on fire

Here’s one I like better, as it sets everything before the message to be the same length, looking a lot cleaner:

"%(asctime)s - %(name)-12s  %(levelname)-8s %(message)s"
2016-12-09 18:00:02,417 - __main__      INFO     Info message
2016-12-09 18:00:02,417 - __main__      WARNING  House set on fire

Doing a lot of threading? Wonder if you’re logging in a thread, or if it’s a thread in a thread?

"%(asctime)s : %(relativeCreated)1d %(threadName)s : %(name)s %(levelname)s %(message)s"
2016-12-09 18:55:12 : 0 MainThread : __main__ INFO Info message
2016-12-09 18:55:12 : 1 Thread-2 : __main__ WARNING House set on fire

There are plenty of options to chose from, they are all listed in the docs.

Putting it all together

Here is a ‘simple’ logging setup from everything discussed so far.
import logging
import sys

logger = logging.getLogger(__name__)
my_stream = logging.StreamHandler(stream=sys.stdout)
    logging.Formatter("%(asctime)s - %(name)-12s  "
                      "%(levelname)-8s %(message)s"))

..."We the people")
2016-12-09 18:00:02,417 - __main__      INFO    We the people

“That’s a lot to remember and write for every program…” I totally agree. I use this type of setup so often, and don’t want to have to re-create a config file for each that I have an importable version of this on PyPI.

import sys

import reusables

logger = reusables.get_logger(__name__, level=10, stream=sys.stdout)

Produces the same output as above. It also has built in formats for different scenarios and logging helper functions. Check out Reusables.

Realistic Example

One of the true powers behind having multiple handlers, is that you can set them at different levels for different goals. In this more realistic example, the main program is using the stream handler outputting to the screen. The submodule then has a child of that logger, which also has a rotating file handler set for errors or above only, meaning it will log to both the file and screen.
import logging
import sys

from subdir import submodule

logger = logging.getLogger("my_project")
my_stream = logging.StreamHandler(stream=sys.stdout)
    logging.Formatter("%(levelname)-8s %(message)s"))

def main():

if __name__ == '__main__':"Starting my project")

import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger("my_project.submodule")
sub_file_handler = RotatingFileHandler("out.log", 
    maxBytes=5000000, backupCount=5)
    logging.Formatter("%(asctime)s - %(name)-12s" 
                      "%(levelname)-8s %(message)s"))

def sub_func(var):"Got variable {}".format(var))
    import time
        return var / 5.0
    except TypeError:
        logger.exception("Could not divide")
INFO     Starting my project
INFO     Got variable test
ERROR    Could not divide
Traceback (most recent call last):
 File "subdir\", line 16, in sub_func
 return var / 5.0
TypeError: unsupported operand type(s) for div: 'str' and 'float'
2016-12-11 14:50:42,977 - my_project.submodule  ERROR Could not divide
Traceback (most recent call last):
  File "subdir\", line 16, in sub_func
    return var / 5.0
TypeError: unsupported operand type(s) for div: 'str' and 'float'

Repeating the setup

“Now that I know how to do it right, how can I just cut to the chase so I don’t have to write all that every time?”

  1. Use a logging config (.ini) file
  2. Use Reusables get_logger
  3. Use a dictionary (which can be loaded from markdown like YAML or JSON)

It’s mostly personal, if not team or company, preference as to which way you set up logging each time. I personally don’t like having to keep track of yet another config file, so I do it in code. Other’s need to be able to easily modify it and have a standard format across projects that can be enforced in config files. I won’t bother getting into the second two here, but more details can be found in the standard library.

Anything more to learn?

We haven’t even talked about fun still like creating custom Handlers, multiprocessing concerns, or Filters (more detailed way to process if something should be logged instead of just levels). There is always more to explore with the rich Python built-ins!