Coding

Indisputably immutable

For many of us, as we develop as coders, we want to continue to grow our knowledge. We pour over the standard library (STL) of this wonderful language, looking for hidden gems. There are many such gems in the Python STL. Such as all the fun things you can do with sets and itertools. But one of the lesser used (which is a real shame) is found in the collections module: namedtuples.

So….what is a named tuple?

from collections import namedtuple

Address = namedtuple("Address", ["number", "street", "city", "state", "zip_code"])

the_prez = Address("1600", "Pennsylvania Avenue", "Washington", "DC", "20500")

print("{the_prez.number} {the_prez.street}".format(**locals()))

 

Pretty boring stuff all around, right? Who cares that you can get dotted notation from this? Why not just write your own class?  You totally can.  There’s nothing to it.

class Address(object):
    def __init__(self, number, street, city, state, zip_code):
        self.number = number
        self.street = street
        self.city = city
        self.state = state
        self.zip_code = zip_code

Yep. Gotta love the classic Python class. Super explicit, which is keeping in one of the core tenants of the Python. But, then, depending on your use-case they may have a weakness.  Let’s say that the type of class you have written is all about data storage and retrieval.  That means you’d really like the data to be reliable and, in a word: immutable.  If you just write the aforementioned class and someone wants to change the street number where the president lives it’s pretty simple stuff:

class Address(object):
    def __init__(self, number, street, city, state, zip_code):
        self.number = number
        self.street = street
        self.city = city
        self.state = state
        self.zip_code = zip_code

the_prez = Address("1600", "Pennsylvania Avenue", "Washington", "DC", "20500")
the_prez.number = "1601"

Boom, now the postman will now deliver your postcard from the grand canyon to the wrong house.

“But,” you say, “I could just override __setitem__ and be done with my class!” You are correct. But, maybe, just maybe you’re a lazy coder (as I believe all coders should be from time to time), and you want to have immutibility without having to rewrite a core behavior of your class?  Why, what you’d be talking about would have the immutability of a tuple, with the flexibility of a class! Sounds crazy?  Sounds impossible?  Well, guess what?  You can have it.

Named tuples, just as the name implies is a tuple (an immutable data type) which uses names instead of numerical indexing. Simple. But, why the devil would you use them if you already control the code? Is it simply the immutability?

class Address(namedtuple("Address", ["number", "street", "city", "state", "zip_code"])):

    def __new__(cls, number=None, street=None, city=None, state=None, zip_code=None):
        return super(Address, cls).__new__(
            cls,
            number=number,
            street=street,
            city=city,
            state=state,
            zip_code=zip_code)

So, what happens if you try to redirect that grand canyon postcard by changing the house number now?

Traceback (most recent call last):
 File "namedtuplecraziness.py", line 74, in
 the_prez.number = "1601"
 AttributeError: can't set attribute

Nope.  Sorry.  Protected by immutability!

Of course, there are always oddities.  If you were to have a mutable variable in the namedtuple, the variable retains its mutability, while the other (non-mutable) variables remain immutable.  Here’s an example where we make the ‘number’ field into a list.  Which means we can now change it through direct index-based assignment or even through append.

Here’s an example where we make the ‘number’ field into a list.  Which means we can now change it through direct index-based assignment or even through append.

from collections import namedtuple

class Address(namedtuple("Address", ["number", "street", "city", "state", "zip_code"])):
    def __new__(cls, number=None, street=None, city=None, state=None, zip_code=None):
        return super(Address, cls).__new__(
            cls,
            number=[number],
            street=street,
            city=city,
            state=state,
            zip_code=zip_code)

a = Address()
print(a)
Address(number=[None], street=None, city=None, state=None, zip_code=None)
a.number[0] = 1600

print(a)
Address(number=[1600], street=None, city=None, state=None, zip_code=None)

a.number.append(1700)

print(a)
Address(number=[1600, 1700], street=None, city=None, state=None, zip_code=None)

Of course, just because it’s a list, and therefore mutable, doesn’t mean you can re-assign it.  That’s when the immutability of the namedtuple asserts itself.

a.number = 1600
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
can't set attribute

 

One of the first things you should know about Python is, EVERYTHING IS AN OBJECT. That means that you can inherit from it. Boom. I just inherited from my named tuple, and now I can add items to the class to define how data is retrieved! Super nifty!

Wow.  Who cares, you say.  Big deal?  What are you, some type of wierdo?   Yes.  Yes, I am.

So, let’s say, that you have a database of addresses. And you want to get a printable mailing address for an entry? Well, you could easily apply the queried records from the database into our Address model (we’ll just use this name as a reference). Now, we have a whole list of these entries. Perhaps we’re having a party, and we want to mail invites to everyone.  This is where the whole nametuple inheritance gives you some pretty cool flexibility.

class Address(namedtuple("Address", ["number", "street", "city", "state", "zip_code"])):

    def __new__(cls, number=None, street=None, city=None, state=None, zip_code=None):
        return super(Address, cls).__new__(
            cls,
            number=number,
            street=street,
            city=city,
            state=state,
            zip_code=zip_code)
   
    def mailing_address(self):
      return ("{self.number} {self.street}\n"
              "{self.city}, {self.state} {self.zip_code}"
             ).format(**locals())

Nice and simple. Or perhaps you’re going to export your entire rolodex to csv (for some reason)?

class Address(namedtuple("Address", ["number", "street", "city", "state", "zip_code"])):

    def __new__(cls, number=None, street=None, city=None, state=None, zip_code=None):
        return super(Address, cls).__new__(
            cls,
            number=number,
            street=street,
            city=city,
            state=state,
            zip_code=zip_code)
   
    def mailing_address(self):
      return ("{self.number} {self.street}\n"
              "{self.city}, {self.state} {self.zip_code}"
             ).format(**locals())

    def to_csv(self):
        return ",".join(self._asdict().values()

 

Now you could just iterate over that list of Address entries and write out the to_csv() call!  It would likely look something like:

addresses = [all your address entries would be here]
with open("rolodex.csv", 'wb') as f:
    [f.write(a.to_csv() for a in addresses]

I’m not advocating for a replacement of traditional classes.

My opinion: Python classes should fall into one of three basic types:

  1. A class could contain data (such as our namedtuple example here)
  2. a class could provide encapsulation of action (like a database library).
  3. Or a class could provide actuation/transformation of data (kind of a mix of #1 and #2)

If you’re dealing with data which is merely going to be acted upon and never changed, then take a swing with these fancy namedtuples.  If you’re doing lots of actuation, write a normal class.   At all times, never should a good coder assume they know the best way to do something.  You’d be surprised how often when I make this assumption, some young, talented coder comes along and corrects my ego.

Exception! Exception! Read all about it!

Exceptions in Python aren’t new, and pretty easy to wrap your head around. Which means you should have no trouble following what this example will output.

import warnings

def bad_func():
    try:
        assert False and True or False
    except AssertionError as outer_err:
        try:
            assert None is False
        except AssertionError as inner_err:
            warnings.warn("None is not False", UserWarning)
            raise inner_err from None
        else:
            return None
    else:
        return True
    finally:
        return False


print(bad_func())

Simple, right?  It raises both except clauses, triggering the warning and so your screen will obviously print False and the  warning.

False
C:/python_file.py:15: UserWarning: None is not False
  warnings.warn("None is not False", UserWarning)

So if you followed that without issue and understand why its happening, go ahead and check out another post, because today we are diving into the depths of Python 3 exception handling.

The basic try except block

If you have coded any semi-resilient Python code, you have used at least the basic try except.

try:
    something()
except:
    pass

If something breaks in the try block, the code in the except block is execute instead of allowing the exception to be raised. However the example code above does something unforgivable, it captures everything. “But that’s what I want it to do!” No, you do not.

Catching everything includes both KeyboardInterrupt and SystemExit. Those Exceptions are outside the normal Exception scope. They are special, because they are cased from external sources. Everything else is caused by something in your code.

That means if you have the following code, it would only exit if it had a SIGKILL sent to it, instead of the regular, and more polite SIGTERM or a user pressing Ctrl+C .

while True:
    try:
        something()
    except:
        pass

Now it’s fine, and even recommend to catch KeyboardInterrupt and SystemExit as needed. A common use case is to do a ‘soft’ exit, where your program tries to finish up any working tasks before exiting. However, in any case you use it, it’s better to be explicit. There should almost always be a separate path for internal vs external errors.

try:
    something()
except Exception as err:
    print(err)
    raise  # A blank except rasies the last exception
except (SystemError, KeyboardInterrupt) as err:
    print('Shutting down safely, please wait until process exits')
    soft_exit()

This way it is clear to the user when either an internal error happened or if it was a user generated event. This is usually only used on the very external event loop or function of the program. For most of your code, the bare minimum you should use in a try except block is:

try:
    something()
except Exception as err:
    print(err)

That way you are capturing all internal exceptions, but none of the system level ones. Also very important, adding a message (in this simple example a print line) will help show what the error was. However, str(err) will only output the error message, not the error type or traceback. We will cover how to get the most of the exception next.

Error messages, custom exceptions and scope

So why does Exception catch everything but the external errors?

Scope

The Exception class is inherited by all the other internal errors. Whereas something like FileNotFoundError is at the lowest point on the totem pole of built-in exception hierarchy. That means it’s as specific as possible, and if you catch that error, anything else will still be raised. 

| Exception
+--| OSError
   +-- FileNotFoundError
   +-- PermissionError

So the further –> you go, the less encapsulation and more specific of error you get.

try:
    raise FileNotFoundError("These aren't the driods you're looking for")
except FileNotFoundError:
    print("It's locked, let's try the next one")
except OSError:
    print("We've got em")

# Output: It's locked, let's try the next one

It works as expected, the exception is caught, it’s message is not printed, rather the code in the except FileNotFoundError is executed. Now keep in mind that other exceptions at the same level will not catch errors on the same level.

try:
    raise FileNotFoundError("These aren't the driods you're looking for")
except PermissionError:
    print("It's locked, let's try the next one")
except OSError:
    print("We've got em")

# Output: We've got em

Only that specific error or it’s parent (or parent’s parent, etc..) will work to capture it. Also keep in mind, with try except blocks, order of operations matter, it won’t sort through all the possible exceptions, it will execute only the first matching block.

try:
    raise FileNotFoundError("These aren't the driods you're looking for")
except OSError:
    print("We've got em")
except FileNotFoundError as err:
    print(err)

# Output: We've got em

That means even though FileNotFoundError was a more precise match, it won’t be executed if an except block above it also matches.

In practice, it’s best to capture the known possible lowest level (highest precision) exceptions. That is because if you have a known common failure, there is generally a different path you want to take instead of complete failure. We will use that in this example, where we want to create a file, but don’t want to overwrite existing ones.

index = 0 
while True:
    index += 1
    try:
        create_a_file(f'file_{index}')
        break
    except FileExistsError:
        continue
    except Exception as err:
        print(f"Unexpected error occurred while creating a file: {err}")

Here you can see, if that file already exists, and create_a_file raises a FileExistsError the loop will increase the index, and try again for file_2 and that will continue until a file doesn’t exist and it break out of the loop.

Custom Exceptions

What if you have code that needs to raise an exception under certain conditions? Well, it’s good to use the built-ins only when exactly applicable,  otherwise it is better to create your own. And you should always start with creating that error off of Exception.

class CustomException(Exception):
    """Something went wrong in my program"""

raise CustomException("It's dead Jim")

All user-defined exceptions should also be derived from [Exception] -Python Standard Library

Don’t be fooled into using the more appropriate sounding BaseException, that is the parent of all exceptions, including SystemExit.

You can of course create your own hierarchical system of exceptions, which will behave the same way as the built-in.

class CustomException(Exception):
    """Something went wrong in my program"""

class FileRelatedError(CustomException):
    """Couldn't access or update a file"""

class IORelatedError(CustomException):
    """Couldn't read from it"""


try:
    raise FileRelatedError("That file should NOT be there")
except CustomException as err:
    print(err)

# Output: That file should NOT be there

Traceback and Messages

So far we have been printing messages in the except blocks. Which is fine for small scripts, and I highly recommended including instead of just pass (Unless that is a planned safe route). However, if there is an exception, you probably want to know a bit more. That’s where the traceback comes in, it lets you know some of the most recent calls leading to the exception.

import traceback
import reusables

try:
    reusables.extract('not_a_real_file')
except Exception as err:
    traceback.print_exc()

Using the built-in traceback module, we can use traceback.print_exc() to directly print the traceback to the screen. (Use my_var = traceback.format_exc() if you plan to capture it to a variable instead of printing it.)

Traceback (most recent call last):
  File "C:\python_file.py", line 11, in <module>
    reusables.extract('not_a_real_file')
  File "C:\reusables.py", line 594, in extract
    raise OSError("File does not exist or has zero size")
OSError: File does not exist or has zero size

This becomes a lifesaver when trying to figure out what went wrong where. Here we can see what line of our code executed to cause the exception reusables.extract and where in that library the exception was raised. That way, if we need to dig deeper, or it’s a more complex issue, everything is nicely laid out for us.

“So what, it prints to my screen when it errors out and exits my program.” True, but what if you don’t want your entire program to fail over a trivial section of code, and want to be able to go back and figure out what erred. An even better example of this is with logging.

The standard library logging module logger has a beautiful feature built-in, .exception. Where it will log the full traceback of the last exception.

import logging
log = logging.getLogger(__name__)

try:
    assert False
except Exception as err:
    log.exception("Could not assert to True")

This will log the usual message you provide at the ERROR level, but will then include the full traceback of the last exception. Extremely valuable for services or background programs.

2017-01-20 22:28:34,784 - __main__      ERROR    Could not assert to True
Traceback (most recent call last):
  File "C:/python_file.py", line 12, in <module>
    assert False
AssertionError

Traceback and logging both have a lot of wonderful abilities that I won’t go into detail with here, but make sure to take the time to look into them.

Finally

try except blocks actually have two additional optional clauses, else and finally sections. Finally is very straightforward, no matter what happens in the rest of try except, that code WILL execute. It supersedes all other returns, it will suppress other block’s raises if it has it’s own raise or return.  (The only case when it won’t run is because the program is terminated during it’s execution.)

That’s why in the very top example code block, even though there is a raise being called, the False is still returned.

    finally:
        return False

The Finally section is invaluable when you are dealing with stuff that is critical to handle for clean up.  For example, safely closing database connections.

try:
    something()
except CustomExpectedError:
    relaunch_program()
except Exception:
    log.exception("Something broke")
except (SystemError, KeyboardInterrupt):
    wait_for_db_transactions()
finally:
    stop_db_transactions()
    db_commit()
    db_close()

Now that is a rather beautiful try except, it catches lowest case first, will log unexpected errors, has a soft_exit if the user presses Ctrl+C, and will safely close the database connection not matter what, even if Ctrl+C is pressed again while wait_for_db_transactions() is being run. (Notice this isn’t a perfect example, as it will still wait to close the database connection after relauch_program is ended, which means that function should do something like fork off so the finally block isn’t wanting  forever to run.)

What Else?

The final possible section of the try except is else, which is the optional success path. Meaning that code will only run if no exceptions are raised. (Remember that code in the finally section will still run as well if present.)

try:
    a, b, *c = "123,456,789,10,11,12".split(",")
except ValueError:
    a, b, c = '1', '2', '3'
else:
    print(",".join(num for num in a))

# Output: 1,2,3

This helps mainly with flow control in the program. For example, if you have a function that tries multiple ways to interpret a string, you could add an else clause to each of the try except blocks that returns on success.  (Dear goodness don’t actually use this code it’s just an example for flow.)

def what_is_it(incoming):
    try:
        float(incoming)
    except ValueError:
        pass  # This is fine as it's a precise, expected exception
    else:
        return 'number'

    try:
        len(str(incoming).split(",")) > 1
    except Exception:
        traceback.print_exc() # Unsure of possible errors, find out
    else:
        return 'list'

From What?

As you might have noticed in the initial example, there  is a line:

raise inner_err from None

This is new in Python 3 and from my experience, hardly ever used, even though very useful. It sets the context of the exception. Ever seen a page log exception with plenty of During handling of the above exception, another exception occurred: 

This can set it to an exact exception above it, so it only displays the relevant ones.

try:
    [] + {}
except TypeError as type_err:
    try:
        assert False
    except AssertionError as assert_err:
        try:
            "" + []
        except TypeError as type_two_err:
            raise type_two_err from type_err

Without the from the above code would show all three tracebacks, but by setting the last exception’s context to the first one, we get only those two, with a new message between them.

Traceback (most recent call last):
  File "C:/python_file.py", line 10, in <module>
    [] + {}
TypeError: can only concatenate list (not "dict") to list

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:/python_file.py", line 18, in <module>
    raise type_two_err from type_err
  File "C:/python_file.py", line 16, in <module>
    "" + []
TypeError: Can't convert 'list' object to str implicitly

Pretty handy stuff, and setting it to from None will remove all other exceptions from the context, so it only displays that exception’s traceback.

try:
    [] + {}
except TypeError as type_err:
    try:
        assert False
    except AssertionError as assert_err:
        try:
            "" + []
        except TypeError as type_two_err:
            raise type_two_err from None
Traceback (most recent call last):
  File "C:/Users/Chris/PycharmProjects/Reusables/ww.py", line 18, in <module>
    raise type_two_err from None
  File "C:/Users/Chris/PycharmProjects/Reusables/ww.py", line 16, in <module>
    "" + []
TypeError: Can't convert 'list' object to str implicitly

Now you and the users of the program know where those exceptions are raised from. 

Warnings

Another often neglected part of the python standard library is warnings. I will also not delve too deeply into it, but at least I’m trying to raise awareness.

I find the most useful feature of warnings as run once reminders. For example, if you have some code that you think might change in the next release, you can put:

import warnings

def my_func():
    warnings.warn("my_func is changing it's name to master_func",
                  FutureWarning)
    master_func()

This will print the warning to stderr, but only the first time by default. So if someone’s code is using this function a thousand times, it doesn’t blast their terminal with warning messages.

To learn more about them, check out Python Module of the Week‘s post on it.

Best Practices

  1. Never ever have a blank except, always catch Exception or more precise exceptions for internal errors, and have a separate except block for external errors.
  2. Only have the code that may cause a precise error in the try block. It’s better to have multiple try except blocks with limited scope rather than a long section of code in the try block with a generic Exception.
  3. Have a meaningful message to users (or yourself) in except blocks.
  4. Create your own exceptions instead of just raising Exception.

 

Book Review: Violent Python

I had the great pleasure of learning directly from Mark Baggett, the Technical Editor of Violent Python, during his Python for Penetration Testers SANS class. Violent Python book was the basis for many of the things done during the week-long training.  I totally don’t mean to brag, but I aced that training, beat all the challenges, and won a snazzy coin.

SEC573 Python for Penetration Testers Coin

So I might be biased, but I really liked the layout of the class and working through the book exercises and have been wanted to share my thoughts on it. Now I have finally gotten time to give it a proper go over.

Overview

There is no wishy-washiness to this book’s intentions, it’s for the Vader in you. If there was a commercial for this book, it would play out like Jaguar’s “It’s good to be bad” campaign.

 In writing this book, we really set out to write an evil cookbook of examples for the darker side of Python

The book is intended for the professional penetration tester more than true black-hats; however, the code in the book itself is not biased towards good or evil. Like a gun, it’s who and how people use it that makes the difference. So why not make the biggest baddest gun around!

The Good

Violent Python is designed for anyone’s skill level. The first chapter is an introduction to Python, while the rest of the chapters are independent, designed as equal part tutorial and reference material. I wouldn’t use it as the very base to start learning how to code, but it may be enough for someone coming from other languages.

The book is laid out really well. Each chapter is self-contained, so they don’t need to be read in order, and each one starts with an excellent summary page. The summary page has a bullet list of what information is in the chapter, an index of its contents, an inspirational quote, and a real-life example of that type of code in action.

The book serves dual purpose, as an intro to common attack surfaces and how to think like a black hat. Each tutorial will start out with an explanation of why you would do something and what it would accomplish. From there it will expand out into multiple scenarios with provided examples.

The examples are functionally well written, which each line of code executing a very clear and specific action with purposeful variable and function names. This helps make it very clear to anyone exactly what the scripts are doing.

They show external tools that are everyday friends of the penetration tester, such as tcpdump. This is important because you need to realize that Python is only a single tool in a hacker’s toolbox. Just because you can do it with Python doesn’t mean you should. Sure, you can write a TCP connect port scanner, but why not just use the excellent and already written nmap?

The Bad

What bugs me the most is their code style choices. PEP8 is technically only guidelines, but I can’t help but wish they didn’t use camelCaseForEverything in SCRIPTS_Named_Weird.py that have VERYLOUD globals. There are a few annoying inconsistencies as well, such as switching between optparse and directly using sys.argv.

At lot of the recipes use a blank try except:

try:
    someCode()
except:
    pass

Don’t do this in real life. It’s fine for quick script prototyping, but you should never have a blank exception, at minimum have except Exception: so that it does not capture SystemExit or KeybaordInterrupt (if you do plan to capture those, do so explicitly and have logging for why you are doing so.)

It is becoming outdated, which is only partial fault of the authors, as it is over four years old. When they say ‘Python’ they are referring to Python 2.6 in the book, even though 2.7 and 3.3 were available before it’s initial release in late 2012. There are several best practices they are missing out on, such as context managers, and outdated libraries, like optparse. I would love to see them update and expand it with a ‘Violent Python 3’ (get it?) at some point.

Verdict

The book overall is very good and I highly recommend it. It’s a fun way to learn or better yourself with Python. Even after reading through it multiple times, I will continue to keep it on my shelf as a valuable reference guide.

Check it out at amazon.com

The Documentation Dilemma

Every coder will come across the same aggravating issue one day, finding a script or library you think you need, and it has zero instructions. Or the examples don’t work, or there is so much writing and so little substance you can’t hope to dig through it all to discover how it’s meant to be used.

Don’t be like that project, document your code smartly. And document to the level of project you are writing. If you have a little script that’s self-contained as a single file, it would be stupid to add a documentation directory and set up Sphinx. Whereas if you were creating a library for other coders, it would be stupid if you didn’t have extensive documentation.

Acceptable Documentation

I used to be like every other college grad, “your code should document itself”, “An idiot could decipher my code”, “if they can’t bother reading through the code, why would they run it on their system?” That is the sound of inexperience and ignorance. For your code to be useful to other people, it needs at minimum to describe what it does and have expected input and outputs, which is easily shown in examples.

The reality is,  professionals simply do a quick search (either company internal or web) and determine if a program or script already exists to do what they want, by reading a short blurb about them and moving on. No one stops to read the code, don’t do your project the disservice of hiding it from others that would find it useful. Not only does it help others, it’s a much better self promoter, showing that you care about your own work and want others to use it.

Here is a quick breakdown of what I view as requirements of documentation:

Single script

  • Readme: can be the docstring at top of file
    • Description of what the script is designed to do
    • Examples, including a sample run with common input / output
  • Code should have meaningful messages on failure

Multiple scripts

  • Readme: standalone file
    • Description of project as a whole
    • Description of each script
    • Example of each script
    • Description and examples of any interconnection between scripts
  • Code needs meaningful messages on failure

Library

  • Readme: standalone file
    • Description of project
    • Installation guide
    • Basic examples
  • Changelog
  • License
  • Documentation directory
    • Stored in a human readable markup language (.md, .rst, .latex) that can be generated into PDF or HTML
    • Python projects should use Sphinx that pulls in docstrings
    • Index with contents of Readme, links to other files for each aspect of library
  • Code should use logging

Library with API

  • Everything in Library
  • API documents for every interface
    • preferably in a common format such as RAML

Program

  • Readme: standalone file
    • Description of project
    • Installation guide
    • Tutorials
      • Including detailed examples / use cases
    • FAQ / troubleshooting
  • License
  • Code should use logging
  • Changelog

Obviously I can’t go into every possible scenario, but I hope this clears up what is a common expectation of a project. And if you intend for others to help with the project, it’s a good idea to include developer docs for contributors that go over code conventions and any particular styles you want to adhere too. If other people are helping, make sure to have a section dedicated thanks to them, such as an Authors file.

Writing the Documentation

Here’s a quick baseline of a README in Markdown you can feel free to copy and reuse.

Project Name
============

Description
-----------

The is an example project that is designed 
to teach documentation. The project will search
imdb and find a movie based off the input given.

Supported Systems:

* Windows XP or newer
* Linux 
 
Tested on:

* Python 2.6+
* Python 3.3+
* Pypy


Examples
--------

Find the movie

```
$ python find_imdb_movie.py 300 

Title: 300 (2006)
Summary: King Leonidas of Sparta and a
         force of 300 men fight the Persians at 
         Thermopylae in 480 B.C.
Rating: 7.7 / 10
```

Get more details about it

```
$ python find_imdb_movie.py --director 300 

Title: 300 (2006)
Director: Zack Snyder
```

License
-------

This code is MIT Licensed, see LICENSE 

If you have a script, and it already has good error messages, you’re good to go! Feel free to add more example cases, FAQ or the full usage output (aka the details given when do you my_project -h, and yes you need that too.)

For larger projects, this is not enough. As it would be very difficult to fit enough examples onto a single readable page, and very easy for them to go stale. To avoid that issue in code libraries, I recommend including the examples of usage in the Python docstrings themselves, and have them auto included in the documentation with Sphinx. That way, when you update a function, you’re much more likely to remember to update the example beside it.

def now(utc=False, tz=None):
"""
Get a current DateTime object. By default is local.

.. code:: python

reusables.now()
# DateTime(2016, 12, 8, 22, 5, 2, 517000)

reusables.now().format("It's {24-hour}:{min}")
# "It's 22:05"

:param utc: bool, default False, UTC time not local
:param tz: TimeZone as specified by the datetime module
:return: reusables.DateTime
"""

Here’s what that will look like when rendered with Sphinx (in this case hosted with Read the Docs )

Reusables.now() documentation example

This is very convenient if you have a library and are exposing all your functions, but isn’t an option for full-fledged programs or APIs. I have seen a few fancy tricks, such as pulling the examples out of the Readme’s and using them as functional tests, or parsing the RAML API files and testing them.  It’s a good way to ensure that all the examples are up to date, but now the documentation is dual purpose, and has to be both machine and human readable, possibly taking too much away from the human readability for the good that the tests do to make sure they are up to date.

So what’s the answer? Well, there isn’t just one. If it’s just you and you’re not going to be updating the project that often, write the examples when it is near completion and try your best to update them if the project updates. If it’s an ever evolving project with multiple contributors, it would probably be best to have some sort of automation to ensure they are not going stale.

The best thing to do is have a standard across your team / company. Even a stupid standard implemented well can result in better documentation than no standards. That way everyone knows exactly where to look for docs, and what they are expected to contribute.

 

Your first date with Python logging

Logging vs Print

The difference between logging and using print isn’t simply personal preference, rather choosing the proper tool per use case. Check out the Python docs when to use logging if you don’t believe me.

If you are simply using Python to write scripts with clear paths and failure on error, it’s perfectly reasonable to not have to use logging. However as soon as you have a long running program, a service, or a complicated script, it’s time to take the plunge.

The Logging Object

In every file you want to log with, you’ll be adding at minimum two lines:

import logging

logger = logging.getLogger("my_program")

This will grab the logger object associated with the name “my_program”. In very basic terms, think of the interpreter having an internal dictionary of loggers. It knows which one to give you back when to send it a key (name). This way, you don’t have to pass the it around yourself. So you should never share or pass logger objects, simply make another call to getLogger.

A very common mistake I see is projects having a single log.py file with the logger object in it and importing it everywhere, don’t do it.

Let’s repeat that a little louder.

Never share or pass the logger object

DO NOT DO

log.py
logger = logging.getLogger("my_program")
main.py
from .log import logger
submodule.py
from .log import logger

END DO NOT DO

It’s bad practice, prevents easy future modifications, and adds needless complication. You will always get the same logger back with the call the getLogger with the same name.

Therefore, if you are trying to get a logger for an imported project, don’t try to go searching for project.logger, simply make a call to getLogger("Project"). “But how do I know what they named their logger?” Most projects use the same convention, which I suggest you also follow:

import logging

logging.getLogger(__name__)

The variable __name__ is a predefined global that houses the running module’s name as imported. So if you run the script directly, it will be "__main__". Whereas if you import the module (my_program.py) it will be "my_program". If you import a submodule of my_prgram.py called submodule.py (creative, arn’t I?) it would be set to "my_program.submodule", just as you would see it as an import line.

“Is it ok to have dots in the name?” It’s better than ok, it’s pure awesome. Python logging has a built in dot hierarchical system, allowing you to specify levels and handlers to the non root logger. In other words, each module could be expanded to have it’s own level and handlers (covered later).

You can set the name manually as well, getLogger("my_program.submodule"), which can be very handy if you want to split logging not by module, but rather shared functionality, or have a different place to log too. Which is one of the main reasons against sharing the logger object. If later on you want to have everything in a type of submodule log to a file, and you were naughty and have shared logger object, you now have to update every file to use a new object. Whereas if you followed good practices and used  getLogger in each file, you would only have to add two lines of code to your main file to add a file handler.

One of the ‘gotchas’ with using the same logger across multiple modules is that you have to have the logger’s handlers and settings set up before you use it.

Sounds like a ‘duh’, but consider a case like:

my_project.py
import logging 
from subdir import submodule

logger = logging.getLogger("my_project") 
# Required handler code redacted for example

logger.warning("Speedboat crashed")


submodule.sub_func(5)
submodule.py
import logging
 
logger = logging.getLogger("my_project.submodule")

logger.debug("Submodule loaded")

def sub_func(var):
    logger.info("Got variable {}".format(var))
    return var + 1
Output
No handlers could be found for logger "my_project.submodule"
2016-12-09 16:14:13,079 - my_project - WARNING - Somethings on fire
2016-12-09 16:21:13,387 - my_project.submodule - INFO - Got variable 5

You will get an error message of “No handlers could be found for logger “my_project.submodule””, because the parent logger was not yet set up when it was imported, but was by the time its function was run.

Levels

In the logging module, there are five different preset logging levels:

  • 10 – DEBUG
  • 20 – INFO
  • 30 – WARNING
  • 40 – ERROR
  • 50 – CRITICAL

When you set a logging level on a logger or handler, you are simply setting a number, and anything set to that number or above will be logged.

logger.setLevel(logging.DEBUG)

Is equivalent to:

logger.setLevel(10)

Everyone uses the first one generally because it is easier to read, and you generally already have the logging module imported, however it is not required.

By default, the logger itself has an effective logging level of WARNING (30), so it will not display debug or info messages. (The built in handlers have a level of NOTSET (0), so they won’t stop stuff from writing, but it is good practice to set them.)

import logging

logger = logging.getLogger(__name__)
# Required handler code redacted for example

print(f"The logger is set at level: {logger.getEffectiveLevel()}")

logger.debug("Debug message")
logger.info("Info message")
logger.warning("Something's on fire")
output
The logger is set at level: 30
2016-12-09 15:23:53,220 - __main__ - WARNING - Something's on fire

It is also possible to create your own level. This is rarely used, I have personally only seen it done on the extreme ends (i.e. 5 for low level debugging and 45 for near critical), so here is an example that doesn’t deal with either of those:

ALMOST_SCARY = 25
logging.addLevelName(ALMOST_SCARY, "ALMOST_SCARY")

logger.setLevel(ALMOST_SCARY)

logger.info("Info message")
logger.log(ALMOST_SCARY, "Spider in the tub")
logger.warning("House set on fire")
output
2016-12-09 15:35:29,413 - __main__ - ALMOST_SCARY - Spider in the tub
2016-12-09 15:35:29,413 - __main__ - WARNING - House set on fire

I find it good practice to set the logger itself as low as possible, then set the level on the handler’s instead. So that way if you eventually want a debug file, you don’t have to change (i.e. debug when fails) multiple lines, just one.

Handlers

Logging handlers are the brains behind the logger that tell it where to send the log messages. The logger object simply has a list of handlers logger.handlers (not to be confused with the available handlers under logging.handlers). When the logger receives a message to log, the logger will iterate through them, and if they have a logging level higher or equal to the one set on the incoming message, it will send it to the registered handler to ‘handle’ (Literal wording from the source code. Programmers are not an inventive lot).

The two most commonly used handlers are StreamHandler and FileHandler (which is a subclass of StreamHandler). However many come in the box, from SysLog to HTTP handlers, and is possible to make or modify one for your own needs.

Here is a basic example of creating a StreamHandler and assigning it to our logger:

import logging
import sys 

logger = logging.getLogger(__name__)

my_stream_handler = logging.StreamHandler(stream=sys.stdout)
my_stream_handler.setLevel(logging.INFO)

logger.setLevel(logging.DEBUG)
logger.addHandler(my_stream_handler)

logger.warning("Crazy ex is headed over, has a blow torch")
output
Crazy ex is headed over, has a blow torch

“Wait a minute, all the other examples had time, module and logging level, where’d they go?” That’s all set up with Formatters, which we set up next. For now, sticking to handlers.

StreamHandler is pretty basic, good to use for sys.stdout and sys.stderr. FileHandler is based on top of it and is very commonly used, and often incorrectly. Anytime you are deploying to a production server that you aren’t running logrotate or similar on, you will want to use one of the rotating file handlers, either RotatingFileHandler, based on file size, or TimedRotatingFileHandler.

import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger(__name__)

# If either maxBytes or backupCount is set to 0, it will not rotate 
my_file_handler = RotatingFileHandler("rotating_example.log",
                                      maxBytes=1024, backupCount=5) 
my_file_handler.setLevel(logging.DEBUG)

logger.setLevel(logging.DEBUG)
logger.addHandler(my_file_handler)


for a in list(range(1, 100))[::-1]: # Reveres to start at 99 and count down
    logger.info("{0} bottles of beer on a wall, take one down," 
                "pass it around, {1} bottles of beer on the wall".format(a, a-1))

This will create six different files (the current log file and past 5 backups), named:

rotating_example.log
rotating_example.log.1
rotating_example.log.2
rotating_example.log.3
rotating_example.log.4
rotating_example.log.5
rotating_example.log
9 bottles of beer on a wall, take one down, pass it around, 8 bottles of beer on the wall
8 bottles of beer on a wall, take one down, pass it around, 7 bottles of beer on the wall
7 bottles of beer on a wall, take one down, pass it around, 6 bottles of beer on the wall
6 bottles of beer on a wall, take one down, pass it around, 5 bottles of beer on the wall
5 bottles of beer on a wall, take one down, pass it around, 4 bottles of beer on the wall
4 bottles of beer on a wall, take one down, pass it around, 3 bottles of beer on the wall
3 bottles of beer on a wall, take one down, pass it around, 2 bottles of beer on the wall
2 bottles of beer on a wall, take one down, pass it around, 1 bottles of beer on the wall
1 bottles of beer on a wall, take one down, pass it around, 0 bottles of beer on the wall
rotating_example.log.5
54 bottles of beer on a wall, take one down, pass it around, 53 bottles of beer on the wall
53 bottles of beer on a wall, take one down, pass it around, 52 bottles of beer on the wall
52 bottles of beer on a wall, take one down, pass it around, 51 bottles of beer on the wall
51 bottles of beer on a wall, take one down, pass it around, 50 bottles of beer on the wall
50 bottles of beer on a wall, take one down, pass it around, 49 bottles of beer on the wall
49 bottles of beer on a wall, take one down, pass it around, 48 bottles of beer on the wall
48 bottles of beer on a wall, take one down, pass it around, 47 bottles of beer on the wall
47 bottles of beer on a wall, take one down, pass it around, 46 bottles of beer on the wall
46 bottles of beer on a wall, take one down, pass it around, 45 bottles of beer on the wall

Anything older than that is lost, which is a good thing when you are dealing more in the size of gigabyte log files after months or years of use.

Formatters

The last critical element of logging is the Formatters. They are often overlooked, as people generally use a copy pasted ones from the python docs. But when customized they can be much more project personalized for better debugging. The default, as seen just above is simply:

"%(message)s"

Each handler has it’s own formatter that is best to be set before it’s added to the logger. Probably the most commonly used one (other than the default), is:

"%(asctime)s - %(name)s - %(levelname)s - %(message)s"

They are all kinda a mouthful to set up:

my_stream_handler.setFormatter(
    logging.Formatter("%(asctime)s - %(name)s - "
                      "%(levelname)s - %(message)s"))
2016-12-09 18:50:56,604 - __main__ - INFO - Info message
2016-12-09 18:50:56,604 - __main__ - WARNING - House set on fire

Here’s one I like better, as it sets everything before the message to be the same length, looking a lot cleaner:

"%(asctime)s - %(name)-12s  %(levelname)-8s %(message)s"
2016-12-09 18:00:02,417 - __main__      INFO     Info message
2016-12-09 18:00:02,417 - __main__      WARNING  House set on fire

Doing a lot of threading? Wonder if you’re logging in a thread, or if it’s a thread in a thread?

"%(asctime)s : %(relativeCreated)1d %(threadName)s : %(name)s %(levelname)s %(message)s"
2016-12-09 18:55:12 : 0 MainThread : __main__ INFO Info message
2016-12-09 18:55:12 : 1 Thread-2 : __main__ WARNING House set on fire

There are plenty of options to chose from, they are all listed in the docs.

Putting it all together

Here is a ‘simple’ logging setup from everything discussed so far.

my_project.py
import logging
import sys

logger = logging.getLogger(__name__)
my_stream = logging.StreamHandler(stream=sys.stdout)
my_stream.setLevel(logging.DEBUG)
my_stream.setFormatter(
    logging.Formatter("%(asctime)s - %(name)-12s  "
                      "%(levelname)-8s %(message)s"))
logger.addHandler(my_stream)
logger.setLevel(logging.DEBUG)

...

logger.info("We the people")
output
2016-12-09 18:00:02,417 - __main__      INFO    We the people

“That’s a lot to remember and write for every program…” I totally agree. I use this type of setup so often, and don’t want to have to re-create a config file for each that I have an importable version of this on PyPI.

import sys

import reusables

logger = reusables.get_logger(__name__, level=10, stream=sys.stdout)

Produces the same output as above. It also has built in formats for different scenarios and logging helper functions. Check out Reusables.

Realistic Example

One of the true powers behind having multiple handlers, is that you can set them at different levels for different goals. In this more realistic example, the main program is using the stream handler outputting to the screen. The submodule then has a child of that logger, which also has a rotating file handler set for errors or above only, meaning it will log to both the file and screen.

my_project.py
import logging
import sys

from subdir import submodule

logger = logging.getLogger("my_project")
my_stream = logging.StreamHandler(stream=sys.stdout)
my_stream.setFormatter(
    logging.Formatter("%(levelname)-8s %(message)s"))
logger.addHandler(my_stream)
logger.setLevel(logging.DEBUG)

def main():
    submodule.sub_func("test")

if __name__ == '__main__':
    logger.info("Starting my project")
    main()

subdir\submodule.py
import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger("my_project.submodule")
sub_file_handler = RotatingFileHandler("out.log", 
    maxBytes=5000000, backupCount=5)
sub_file_handler.setFormatter(
    logging.Formatter("%(asctime)s - %(name)-12s" 
                      "%(levelname)-8s %(message)s"))
sub_file_handler.setLevel(logging.WARNING)
logger.addHandler(sub_file_handler)

def sub_func(var):
    logger.info("Got variable {}".format(var))
    import time
    time.sleep(1)
    try:
        return var / 5.0
    except TypeError:
        logger.exception("Could not divide")
stdout
INFO     Starting my project
INFO     Got variable test
ERROR    Could not divide
Traceback (most recent call last):
 File "subdir\submodule.py", line 16, in sub_func
 return var / 5.0
TypeError: unsupported operand type(s) for div: 'str' and 'float'
out.log
2016-12-11 14:50:42,977 - my_project.submodule  ERROR Could not divide
Traceback (most recent call last):
  File "subdir\submodule.py", line 16, in sub_func
    return var / 5.0
TypeError: unsupported operand type(s) for div: 'str' and 'float'

Repeating the setup

“Now that I know how to do it right, how can I just cut to the chase so I don’t have to write all that every time?”

  1. Use Reusables get_logger.
  2. Use a logging config (.ini) file
  3. Use a dictionary (which can be loaded from markdown like YAML or JSON)

It’s mostly personal, if not team or company, preference as to which way you set up logging each time. I personally don’t like having to keep track of yet another config file, so I do it in code. Other’s need to be able to easily modify it and have a standard format across projects that can be enforced in config files. I won’t bother getting into the second two here, but more details can be found in the standard library.

Anything more to learn?

We haven’t even talked about fun still like creating custom Handlers, multiprocessing concerns, or Filters (more detailed way to process if something should be logged instead of just levels). Wait a few weeks and maybe there’ll be a more in-depth post.