logging

Reusables – Part 1: Overview and File Management

Reusables 0.8 has just been released, and it’s about time I give it a proper introduction.

I started this project three years ago, with a simple goal of keeping code that I inevitably end up reusing grouped into a single library. It’s for the stuff that’s too small to do well as it’s own library, but common enough it’s handy to reuse rather than rewrite each time.

It is designed to make the developer’s life easier in a number of ways. First, it requires no external modules, it’s possible to supplement some functionality with the modules specified in the requreiments.txt file, but are only required for specific use cases; for example: rarfile is only used to extract, you guessed it, rar files.

Second, everything is tested on both Python 2.6+ and Python 3.3+, also tested on pypy. It is cross platform compatible Windows/Linux, unless a specific function or class specifies otherwise.

Third, everything is documented via docstrings, so they are available at readthedocs, or through the built-in help() command in python.

Lastly, all functions and classes are all available at the root level (except CLI helpers), and can be broadly categorized as follows:

  • File Management
    • Functions that deal with file system operations.
  • Logging
    • Functions to help setup and modify logging capabilities.
  • Multiprocessing
    • Fast and dynamic multiprocessing or threading tools.
  • Web
    • Things related to dealing with networking, urls, downloading, serving, etc.
  • Wrappers
    • Function wrappers.
  • Namespace
    • Custom class to expand the usability of python dictionaries as objects.
  • DateTime
    • Custom datetime class primarily for easier formatting.
  • Browser Cookie Management
    • Find, extract or modify cookies of Firefox and Chrome on a system.
  • Command Line Helpers
    • Bash analogues to help system admins perform faster operations from inside an interactive python shell.

In this overview, we will cover:

  1. Installation
  2. Getting Started
  3. File, Folder and String Management
    1. Find Files Fast
    2. Archives (Extraction and Compression)
    3. Run Command
    4. File Hashing
    5. Finding Duplicate Files
    6. Safe File and Folder Names
    7. Touch (ing a file)
    8. Simple JSON and CSV
    9. Cut (ing a string into equal lengths)
    10. Config to dictionary

Installation

Very straightforward install, just do a simple pip or easy_install from PyPI.

pip install reusables

OR

easy_install reusables

If you need to install it on an offline computer, grab the appropriate Python 2.x or 3.x wheel from PyPI, and just pip install it directly.

There are no additional modules required for install, so if either of those don’t work, please open an issue at github.

Getting Started

import reusables 

reusables.add_stream_handler('reusables', level=10)

The logger’s name is ‘reusables’, and by default does not have any handlers associated with it. For these examples we will have logging on debug level, if you aren’t familiar with logging, please read my post about logging.

File, Folder and String Management

Everything here deals with managing something on the disk, or strings that relate to files. From checking for safe filenames to saving data files.

I’m going to start the show off with my most reused function, that is also one of the most versatile and powerful, find_files. It is basically an advanced implementation of os.walk.

Find Files Fast

reusables.find_files_list("F:\\Pictures",
                              ext=reusables.exts.pictures, 
                              name="sam", depth=3)

# ['F:\\Pictures\\Family\\SAM.JPG', 
# 'F:\\Pictures\\Family\\Family pictures - assorted\\Sam in 2009.jpg']

With a single line, we are able to search a directory for files by a case insensitive name, a list (or single string) of extensions and even specify a depth.  It’s also really fast, taking under five seconds to search through 70,000 files and 30,000 folders, taking just half a second longer than using the windows built in equivalent dir /s *sam* | findstr /i "\.jpg \.png \.jpeg \.gif \.bmp \.tif \.tiff \.ico \.mng \.tga \.xcf \.svg".

If you don’t need it as a list, use the generator itself.

for pic in reusables.find_files("F:\\Pictures", name="*chris *.jpg"):
    print(pic)

# F:\Pictures\Family\Family pictures - assorted\Chris 1st grade.jpg
# F:\Pictures\Family\Family pictures - assorted\Chris 6th grade.jpg
# F:\Pictures\Family\Family pictures - assorted\Chris at 3.jpg

That’s right, it also supports glob wildcards. It even supports using the external module scandir for older versions of Python that don’t have it nativity (only if enable_scandir=True is specified of course, its one of those supplemental modules). Check out the full documentation and more examples at readthedocs.

Archives

Dealing with the idiosyncrasies between the compression libraries provided by Python can be a real pain. I set out to make a super simple and straight forward way to archive and extract folders.

reusables.archive(['reusables',    # Folder with files 
                   'tests',        # Folder with subfolders
                   'AUTHORS.rst'], # Standalone file
                   name="my_archive.bz2")

# 'C:\Users\Me\Reusables\my_archive.bz2'

It will compress everything, store it, and keep folder structure in the archives.

To extract files, it is very similar behavior. Given a ‘wallpapers.zip’ file like this:

It is trivial to extract it to a location without having to specify it’s archive type.

reusables.extract("wallpapers.zip",
                  path="C:\\Users\\Me\\Desktop\\New Folder 6\\")
# ... DEBUG File wallpapers.zip detected as a zip file
# ... DEBUG Extracting files to C:\Users\Me\Desktop\New Folder 6\
# 'C:\\Users\\Me\\Desktop\\New Folder 6'

We can see that it extracted everything and again kept it’s folder structure.

The only support difference between the two is that you can extract rar files if you have installed rarfile and dependencies (and specified enable_rar=True), but cannot archive them due to licensing.

Run Command

Ok, so it many not always deal with the file system, but it’s better here than anywhere else. As you may or may not know, in Python 3.5 they released the excellent subprocess.run which is a convenient wrapper around Popen that returns a clean CompletedProcess class instance. reusables.run is designed to be a version agnostic clone, and will even directly run subprocess.run on Python 3.5 and higher.

reusables.run("cat setup.cfg", shell=True)

# CompletedProcess(args='cat setup.cfg', returncode=0, 
#                 stdout=b'[metadata]\ndescription-file = README.rst')

It does have a few subtle differences that I want to highlight:

  • By default, sets stdout and stderr to subprocess.PIPE, that way the result is always is in the returned CompletedProcess instance.
  • Has an additional copy_local_env argument, which will copy your current shell environment to the subprocess if True.
  • Timeout is accepted, buy will raise a NotImplimentedError if set on Python 2.x.
  • It doesn’t take positional Popen arguments, only keyword args (2.6 limitation).
  • It returns the same output as Popen, so on Python 2.x stdout and stderr are strings, and on 3.x they are bytes.

Here you can see an example of copy_local_env  in action running on Python 2.6.

import os

os.environ['MYVAR'] = 'Butterfly'

reusables.run("echo $MYVAR", copy_local_env=True, shell=True)

# CompletedProcess(args='echo $MYVAR', returncode=0, 
#                 stdout='Butterfly\n')

File Hashing

Python already has nice hashing capabilities through hashlib, but it’s a pain to rewrite the custom code for being able to handle large files without a large memory impact.  Consisting of opening a file and iterating over it in chunks and updating the hash. Instead, here is a convenient function.

reusables.file_hash("reusables\\reusables.py", hash_type="sha")

# '50c5425f9780d5adb60a137528b916011ed09b06'

By default it returns an md5 hash, but can be set to anything available on that system, and returns it in the hexdigest format, if the kwargs hex_digest is set to false, it will be returned as bytes.

reusables.file_hash("reusables\\reusables.py", hex_digest=False)

# b'4\xe6\x03zPs\xf5\xe9\x8dX\x9c/=/<\x94'

Starting with python 2.7.9, you can quickly view the available hashes directly from hashlib via hashlib.algorithms_available.

# CPython 3.6
import hashlib

print(f"{hashlib.algorithms_available}")
# {'sha3_256', 'MD4', 'sha512', 'sha3_512', 'DSA-SHA', 'md4', ...

reusables.file_hash("wallpapers.zip", "sha3_256")

# 'b7c357d582f8932977d785a24f728b267cef1de87537076aadac5049f4e4fa70'

Duplicate Files

You know you’ve seen this picture  before, you shouldn’t have to safe it again, where did that sucker go? Wonder no more, find it!

list(reusables.dup_finder("F:\\Pictures\\20131005_212718.jpg", 
                          directory="F:\\Pictures"))

# ['F:\\Pictures\\20131005_212718.jpg',
#  'F:\\Pictures\\Me\\20131005_212718.jpg',
#  'F:\\Pictures\\Personal Favorite\\20131005_212718.jpg']

dup_finder is a generator that will search for a given file at a directory, and all sub-directories. This is a very fast function, as it does a three step escalation to detect duplicates, if a step does not match, it will not continue with the other checks, they are verified in this order:

  1. File size
  2. First twenty bytes
  3. Full SHA256 compare

That is excellent for finding a single file, but how about all duplicates in a directory? The traditional option is to create a dictionary of hashes of all the files to compares against. It works, but is slow. Reusables has directory_duplicates function, which first does a file size comparison first, and only moves onto hash comparisons if the size matches.

reusables.directory_duplicates(".")

# [['.\\.git\\refs\\heads\\master', '.\\.git\\refs\\tags\\0.5.2'], 
#  ['.\\test\\empty', '.\\test\\fake_dir']]

It returns a list of lists, each internal list is a group of matching files.  (To be clear “empty” and “fake_dir” are both empty files used for testing.)

Just how much faster is it this way? Here’s a benchmark on my system of searching through over sixty-six thousand (66,000)  files in thirty thousand (30,000) directories.

The comparison code (the Reusables duplicate finder is refereed to as ‘size map’)

import reusables

@reusables.time_it(message="hash map took {seconds:.2f} seconds")
def hash_map(directory):
    hashes = {}
    for file in reusables.find_files(directory):
        file_hash = reusables.file_hash(file)
        hashes.setdefault(file_hash, []).append(file)

    return [v for v in hashes.values() if len(v) > 1]


@reusables.time_it(message="size map took {seconds:.2f} seconds")
def size_map(directory):
    return reusables.directory_duplicates(directory)


if __name__ == '__main__':
    directory = "F:\\Pictures"

    size_map_run = size_map(directory)
    print(f"size map returned {len(size_map_run)} duplicates")

    hash_map_run = hash_map(directory)
    print(f"hash map returned {len(hash_map_run)} duplicates")

The speed up of checking size first in our scenario is significant, over 16 times faster.

size map took 40.23 seconds
size map returned 3511 duplicates

hash map took 642.68 seconds
hash map returned 3511 duplicates

It jumps from under a minute for using reusables.directory_duplicates to over ten minutes when using a traditional hash map. This is the fastest pure Python method I have found, if you find faster, let me know!

Safe File Names

There are plenty of instances that you want to save a meaningful filename supplied by a user, say for a file transfer program or web upload service, but what if they are trying to crash your system?

Reusables has three functions to help you out.

  • check_filename: returns true if safe to use, else false
  • safe_filename: returns a pruned filename
  • safe_path: returns a safe path

These are designed not off of all legally allowed characters per system, but a restricted set of letters, numbers, spaces, hyphens, underscores and periods.

reusables.check_filename("safeFile?.text")
# False

reusables.safe_filename("safeFile?.txt")
# 'safeFile_.txt'

reusables.safe_path("C:\\test'\\%my_file%\\;'1 OR 1\\filename.txt")
# 'C:\\test_\\_my_file_\\__1 OR 1\\filename.txt'

Touch

Designed to be same as Linux touch command. It will create the file if it does not exist, and updates the access and modified times to now.

time.time()
# 1484450442.2250443

reusables.touch("new_file")

os.path.getmtime("new_file")
# 1484450443.804158

Simple JSON and CSV save and restore

These are already super simple to implement in pure python with the standard library, and are just here for convince of not having to remember conventions.

List of lists to CSV file and back

my_list = [["Name", "Location"],
           ["Chris", "South Pole"],
           ["Harry", "Depth of Winter"],
           ["Bob", "Skull"]]

reusables.list_to_csv(my_list, "example.csv")

# example.csv
#
# "Name","Location"
# "Chris","South Pole"
# "Harry","Depth of Winter"
# "Bob","Skull"


reusables.csv_to_list("example.csv")

# [['Name', 'Location'], ['Chris', 'South Pole'], ['Harry', 'Depth of Winter'], ['Bob', 'Skull']]

Save JSON with default indent of 4

my_dict = {"key_1": "val_1",
           "key_for_dict": {"sub_dict_key": 8}}

reusables.save_json(my_dict,"example.json")

# example.json
# 
# {
#     "key_1": "val_1",
#     "key_for_dict": {
#         "sub_dict_key": 8
#     }
# }

reusables.load_json("example.json")

# {'key_1': 'val_1', 'key_for_dict': {'sub_dict_key': 8}}

Cut a string into equal lengths

Ok, I admit, this one has absolutely nothing to do with the file system, but it’s just to handy to not mention right now (and doesn’t really fit anywhere else). One of the features I was most surprised to not be included in the standard library was to a have a function that could cut strings into even sections.

I haven’t seen any PEPs about it either way, but I wouldn’t be surprised if one of the reasons is ‘why do to with leftover characters?’. Instead of forcing you to stick with one, Reusables has four different ways it can behave for your requirement.

By default, it will simply cut everything into even segments, and not worry if the last one has matching length.

reusables.cut("abcdefghi")
# ['ab', 'cd', 'ef', 'gh', 'i']

The other options are to remove it entirely, combine it into the previous grouping (still uneven but now last item is longer than rest instead of shorter) or raise an IndexError exception.

reusables.cut("abcdefghi", 2, "remove")
# ['ab', 'cd', 'ef', 'gh']

reusables.cut("abcdefghi", 2, "combine")
# ['ab', 'cd', 'ef', 'ghi']

reusables.cut("abcdefghi", 2, "error")
# Traceback (most recent call last):
#     ...
# IndexError: String of length 9 not divisible by 2 to splice

Config to Dictionary

Everybody and their co-worker has written a ‘better’ config file handler of some sort, this isn’t trying to add to that pile, I swear. This is simply a very quick converter using the built in parser directly to dictionary format, or to a python object  I call a Namespace (more on that in future post.)

Just to make clear, this only reads configs, not writes any changes. So given an example config.ini file:

[General]
example=A regular string

[Section 2]
my_bool=yes
anint=234
exampleList=234,123,234,543
floatly=4.4

It reads it as is into a dictionary. Notice there is no automatic parsing or anything fancy going on at all.

reusables.config_dict("config.ini")
# {'General': {'example': 'A regular string'},
#  'Section 2': {'anint': '234',
#                'examplelist': '234,123,234,543',
#                'floatly': '4.4',
#                'my_bool': 'yes'}}

You can also take it into a ConfigNamespace.

config = reusables.config_namespace("config.ini")
# <ConfigNamespace: {'General': {'example': 'A regular string'}, 'Section 2': ...

Namespaces are special dictionaries that allow for dot notation, similar to Bunch but recursively convert dictionaries into Namespaces.

config.General
# <ConfigNamespace: {'example': 'A regular string'}>

ConfigNamespace has handy built-in type specific retrieval.  Notice that dot notation will not work if item have spaces in them, but the regular dictionary key notation works as well.

config['Section 2'].bool("my_bool")
# True

config['Section 2'].bool("bool_that_doesn't_exist", default=False)
# False
# If no default specified, will raise AttributeError

config['Section 2'].float('floatly')
# 4.4

It supports booleans, floats, ints, and unlike the default config parser, lists. Which even accepts a modifier function.

config['Section 2'].list('examplelist', mod=int)
# [234, 123, 234, 543]

Finale

That’s all for this first overview,. hope you found something useful and will make your life easier!

Related links:

Your first date with Python logging

Logging vs Print

The difference between logging and using print isn’t simply personal preference, rather choosing the proper tool per use case. Check out the Python docs when to use logging if you don’t believe me.

If you are simply using Python to write scripts with clear paths and failure on error, it’s perfectly reasonable to not have to use logging. However as soon as you have a long running program, a service, or a complicated script, it’s time to take the plunge.

The Logging Object

In every file you want to log with, you’ll be adding at minimum two lines:

import logging

logger = logging.getLogger("my_program")

This will grab the logger object associated with the name “my_program”. In very basic terms, think of the interpreter having an internal dictionary of loggers. It knows which one to give you back when to send it a key (name). This way, you don’t have to pass the it around yourself. So you should never share or pass logger objects, simply make another call to getLogger.

A very common mistake I see is projects having a single log.py file with the logger object in it and importing it everywhere, don’t do it.

Let’s repeat that a little louder.

Never share or pass the logger object

DO NOT DO

log.py
logger = logging.getLogger("my_program")
main.py
from .log import logger
submodule.py
from .log import logger

END DO NOT DO

It’s bad practice, prevents easy future modifications, and adds needless complication. You will always get the same logger back with the call the getLogger with the same name.

Therefore, if you are trying to get a logger for an imported project, don’t try to go searching for project.logger, simply make a call to getLogger("Project"). “But how do I know what they named their logger?” Most projects use the same convention, which I suggest you also follow:

import logging

logging.getLogger(__name__)

The variable __name__ is a predefined global that houses the running module’s name as imported. So if you run the script directly, it will be "__main__". Whereas if you import the module (my_program.py) it will be "my_program". If you import a submodule of my_prgram.py called submodule.py (creative, arn’t I?) it would be set to "my_program.submodule", just as you would see it as an import line.

“Is it ok to have dots in the name?” It’s better than ok, it’s pure awesome. Python logging has a built in dot hierarchical system, allowing you to specify levels and handlers to the non root logger. In other words, each module could be expanded to have it’s own level and handlers (covered later).

You can set the name manually as well, getLogger("my_program.submodule"), which can be very handy if you want to split logging not by module, but rather shared functionality, or have a different place to log too. Which is one of the main reasons against sharing the logger object. If later on you want to have everything in a type of submodule log to a file, and you were naughty and have shared logger object, you now have to update every file to use a new object. Whereas if you followed good practices and used  getLogger in each file, you would only have to add two lines of code to your main file to add a file handler.

One of the ‘gotchas’ with using the same logger across multiple modules is that you have to have the logger’s handlers and settings set up before you use it.

Sounds like a ‘duh’, but consider a case like:

my_project.py
import logging 
from subdir import submodule

logger = logging.getLogger("my_project") 
# Required handler code redacted for example

logger.warning("Speedboat crashed")


submodule.sub_func(5)
submodule.py
import logging
 
logger = logging.getLogger("my_project.submodule")

logger.debug("Submodule loaded")

def sub_func(var):
    logger.info("Got variable {}".format(var))
    return var + 1
Output
No handlers could be found for logger "my_project.submodule"
2016-12-09 16:14:13,079 - my_project - WARNING - Somethings on fire
2016-12-09 16:21:13,387 - my_project.submodule - INFO - Got variable 5

You will get an error message of “No handlers could be found for logger “my_project.submodule””, because the parent logger was not yet set up when it was imported, but was by the time its function was run.

Levels

In the logging module, there are five different preset logging levels:

  • 10 – DEBUG
  • 20 – INFO
  • 30 – WARNING
  • 40 – ERROR
  • 50 – CRITICAL

When you set a logging level on a logger or handler, you are simply setting a number, and anything set to that number or above will be logged.

logger.setLevel(logging.DEBUG)

Is equivalent to:

logger.setLevel(10)

Everyone uses the first one generally because it is easier to read, and you generally already have the logging module imported, however it is not required.

By default, the logger itself has an effective logging level of WARNING (30), so it will not display debug or info messages. (The built in handlers have a level of NOTSET (0), so they won’t stop stuff from writing, but it is good practice to set them.)

import logging

logger = logging.getLogger(__name__)
# Required handler code redacted for example

print(f"The logger is set at level: {logger.getEffectiveLevel()}")

logger.debug("Debug message")
logger.info("Info message")
logger.warning("Something's on fire")
output
The logger is set at level: 30
2016-12-09 15:23:53,220 - __main__ - WARNING - Something's on fire

It is also possible to create your own level. This is rarely used, I have personally only seen it done on the extreme ends (i.e. 5 for low level debugging and 45 for near critical), so here is an example that doesn’t deal with either of those:

ALMOST_SCARY = 25
logging.addLevelName(ALMOST_SCARY, "ALMOST_SCARY")

logger.setLevel(ALMOST_SCARY)

logger.info("Info message")
logger.log(ALMOST_SCARY, "Spider in the tub")
logger.warning("House set on fire")
output
2016-12-09 15:35:29,413 - __main__ - ALMOST_SCARY - Spider in the tub
2016-12-09 15:35:29,413 - __main__ - WARNING - House set on fire

I find it good practice to set the logger itself as low as possible, then set the level on the handler’s instead. So that way if you eventually want a debug file, you don’t have to change (i.e. debug when fails) multiple lines, just one.

Handlers

Logging handlers are the brains behind the logger that tell it where to send the log messages. The logger object simply has a list of handlers logger.handlers (not to be confused with the available handlers under logging.handlers). When the logger receives a message to log, the logger will iterate through them, and if they have a logging level higher or equal to the one set on the incoming message, it will send it to the registered handler to ‘handle’ (Literal wording from the source code. Programmers are not an inventive lot).

The two most commonly used handlers are StreamHandler and FileHandler (which is a subclass of StreamHandler). However many come in the box, from SysLog to HTTP handlers, and is possible to make or modify one for your own needs.

Here is a basic example of creating a StreamHandler and assigning it to our logger:

import logging
import sys 

logger = logging.getLogger(__name__)

my_stream_handler = logging.StreamHandler(stream=sys.stdout)
my_stream_handler.setLevel(logging.INFO)

logger.setLevel(logging.DEBUG)
logger.addHandler(my_stream_handler)

logger.warning("Crazy ex is headed over, has a blow torch")
output
Crazy ex is headed over, has a blow torch

“Wait a minute, all the other examples had time, module and logging level, where’d they go?” That’s all set up with Formatters, which we set up next. For now, sticking to handlers.

StreamHandler is pretty basic, good to use for sys.stdout and sys.stderr. FileHandler is based on top of it and is very commonly used, and often incorrectly. Anytime you are deploying to a production server that you aren’t running logrotate or similar on, you will want to use one of the rotating file handlers, either RotatingFileHandler, based on file size, or TimedRotatingFileHandler.

import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger(__name__)

# If either maxBytes or backupCount is set to 0, it will not rotate 
my_file_handler = RotatingFileHandler("rotating_example.log",
                                      maxBytes=1024, backupCount=5) 
my_file_handler.setLevel(logging.DEBUG)

logger.setLevel(logging.DEBUG)
logger.addHandler(my_file_handler)


for a in list(range(1, 100))[::-1]: # Reveres to start at 99 and count down
    logger.info("{0} bottles of beer on a wall, take one down," 
                "pass it around, {1} bottles of beer on the wall".format(a, a-1))

This will create six different files (the current log file and past 5 backups), named:

rotating_example.log
rotating_example.log.1
rotating_example.log.2
rotating_example.log.3
rotating_example.log.4
rotating_example.log.5
rotating_example.log
9 bottles of beer on a wall, take one down, pass it around, 8 bottles of beer on the wall
8 bottles of beer on a wall, take one down, pass it around, 7 bottles of beer on the wall
7 bottles of beer on a wall, take one down, pass it around, 6 bottles of beer on the wall
6 bottles of beer on a wall, take one down, pass it around, 5 bottles of beer on the wall
5 bottles of beer on a wall, take one down, pass it around, 4 bottles of beer on the wall
4 bottles of beer on a wall, take one down, pass it around, 3 bottles of beer on the wall
3 bottles of beer on a wall, take one down, pass it around, 2 bottles of beer on the wall
2 bottles of beer on a wall, take one down, pass it around, 1 bottles of beer on the wall
1 bottles of beer on a wall, take one down, pass it around, 0 bottles of beer on the wall
rotating_example.log.5
54 bottles of beer on a wall, take one down, pass it around, 53 bottles of beer on the wall
53 bottles of beer on a wall, take one down, pass it around, 52 bottles of beer on the wall
52 bottles of beer on a wall, take one down, pass it around, 51 bottles of beer on the wall
51 bottles of beer on a wall, take one down, pass it around, 50 bottles of beer on the wall
50 bottles of beer on a wall, take one down, pass it around, 49 bottles of beer on the wall
49 bottles of beer on a wall, take one down, pass it around, 48 bottles of beer on the wall
48 bottles of beer on a wall, take one down, pass it around, 47 bottles of beer on the wall
47 bottles of beer on a wall, take one down, pass it around, 46 bottles of beer on the wall
46 bottles of beer on a wall, take one down, pass it around, 45 bottles of beer on the wall

Anything older than that is lost, which is a good thing when you are dealing more in the size of gigabyte log files after months or years of use.

Formatters

The last critical element of logging is the Formatters. They are often overlooked, as people generally use a copy pasted ones from the python docs. But when customized they can be much more project personalized for better debugging. The default, as seen just above is simply:

"%(message)s"

Each handler has it’s own formatter that is best to be set before it’s added to the logger. Probably the most commonly used one (other than the default), is:

"%(asctime)s - %(name)s - %(levelname)s - %(message)s"

They are all kinda a mouthful to set up:

my_stream_handler.setFormatter(
    logging.Formatter("%(asctime)s - %(name)s - "
                      "%(levelname)s - %(message)s"))
2016-12-09 18:50:56,604 - __main__ - INFO - Info message
2016-12-09 18:50:56,604 - __main__ - WARNING - House set on fire

Here’s one I like better, as it sets everything before the message to be the same length, looking a lot cleaner:

"%(asctime)s - %(name)-12s  %(levelname)-8s %(message)s"
2016-12-09 18:00:02,417 - __main__      INFO     Info message
2016-12-09 18:00:02,417 - __main__      WARNING  House set on fire

Doing a lot of threading? Wonder if you’re logging in a thread, or if it’s a thread in a thread?

"%(asctime)s : %(relativeCreated)1d %(threadName)s : %(name)s %(levelname)s %(message)s"
2016-12-09 18:55:12 : 0 MainThread : __main__ INFO Info message
2016-12-09 18:55:12 : 1 Thread-2 : __main__ WARNING House set on fire

There are plenty of options to chose from, they are all listed in the docs.

Putting it all together

Here is a ‘simple’ logging setup from everything discussed so far.

my_project.py
import logging
import sys

logger = logging.getLogger(__name__)
my_stream = logging.StreamHandler(stream=sys.stdout)
my_stream.setLevel(logging.DEBUG)
my_stream.setFormatter(
    logging.Formatter("%(asctime)s - %(name)-12s  "
                      "%(levelname)-8s %(message)s"))
logger.addHandler(my_stream)
logger.setLevel(logging.DEBUG)

...

logger.info("We the people")
output
2016-12-09 18:00:02,417 - __main__      INFO    We the people

“That’s a lot to remember and write for every program…” I totally agree. I use this type of setup so often, and don’t want to have to re-create a config file for each that I have an importable version of this on PyPI.

import sys

import reusables

logger = reusables.get_logger(__name__, level=10, stream=sys.stdout)

Produces the same output as above. It also has built in formats for different scenarios and logging helper functions. Check out Reusables.

Realistic Example

One of the true powers behind having multiple handlers, is that you can set them at different levels for different goals. In this more realistic example, the main program is using the stream handler outputting to the screen. The submodule then has a child of that logger, which also has a rotating file handler set for errors or above only, meaning it will log to both the file and screen.

my_project.py
import logging
import sys

from subdir import submodule

logger = logging.getLogger("my_project")
my_stream = logging.StreamHandler(stream=sys.stdout)
my_stream.setFormatter(
    logging.Formatter("%(levelname)-8s %(message)s"))
logger.addHandler(my_stream)
logger.setLevel(logging.DEBUG)

def main():
    submodule.sub_func("test")

if __name__ == '__main__':
    logger.info("Starting my project")
    main()

subdir\submodule.py
import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger("my_project.submodule")
sub_file_handler = RotatingFileHandler("out.log", 
    maxBytes=5000000, backupCount=5)
sub_file_handler.setFormatter(
    logging.Formatter("%(asctime)s - %(name)-12s" 
                      "%(levelname)-8s %(message)s"))
sub_file_handler.setLevel(logging.WARNING)
logger.addHandler(sub_file_handler)

def sub_func(var):
    logger.info("Got variable {}".format(var))
    import time
    time.sleep(1)
    try:
        return var / 5.0
    except TypeError:
        logger.exception("Could not divide")
stdout
INFO     Starting my project
INFO     Got variable test
ERROR    Could not divide
Traceback (most recent call last):
 File "subdir\submodule.py", line 16, in sub_func
 return var / 5.0
TypeError: unsupported operand type(s) for div: 'str' and 'float'
out.log
2016-12-11 14:50:42,977 - my_project.submodule  ERROR Could not divide
Traceback (most recent call last):
  File "subdir\submodule.py", line 16, in sub_func
    return var / 5.0
TypeError: unsupported operand type(s) for div: 'str' and 'float'

Repeating the setup

“Now that I know how to do it right, how can I just cut to the chase so I don’t have to write all that every time?”

  1. Use Reusables get_logger.
  2. Use a logging config (.ini) file
  3. Use a dictionary (which can be loaded from markdown like YAML or JSON)

It’s mostly personal, if not team or company, preference as to which way you set up logging each time. I personally don’t like having to keep track of yet another config file, so I do it in code. Other’s need to be able to easily modify it and have a standard format across projects that can be enforced in config files. I won’t bother getting into the second two here, but more details can be found in the standard library.

Anything more to learn?

We haven’t even talked about fun still like creating custom Handlers, multiprocessing concerns, or Filters (more detailed way to process if something should be logged instead of just levels). Wait a few weeks and maybe there’ll be a more in-depth post.