basic

Reusables – Part 2: Wrappers

Spice up your code with wrappers! In Python, a wrapper, also known as a decorator, is simply encapsulating a function within other functions.

@wrapper
def my_func(a, b=2):
    print(b)

@meta_decorator(foo="bar")
def my_other_func(**kwargs):
    print(kwargs)

In Reusables, all the wrappers take arguments, aka meta decorators, so you will at minimum have to end them with parens ()s. Let’s take a look at the one I use the most first.

time_it

import reusables

@reusables.time_it()
def test_func():
    import time, random
    time.sleep(random.randint(2, 5))

test_func()
# Function 'test_func' took a total of 5.000145769345911 seconds

time_it documentation
The message by default is printed. However it can also be sent to a log with a customized message. If log=True it will log to the Reusables logger, however you can also specify either a logger by name (string) or pass in a logger object directly.

reusables.add_stream_handler('reusables')

@reusables.time_it(log=True, message="{seconds:.2f} seconds")
def test_time(length):
    time.sleep(length)
    return "slept {0}".format(length)

result = test_time(5)
# 2016-11-09 16:59:39,935 - reusables.wrappers  INFO      5.01 seconds

print(result)
# slept 5

It’s also possible to capture time taken in a list.

@reusables.time_it(log='no_log', append=my_times)
def test_func():
    import time, random
    length = random.random()
    time.sleep(length)
    
for _ in range(10):
    test_func()
    
my_times
# [0.4791555858872698, 0.8963652232890809, 0.05607090172793505, 
# 0.03099917658380491,0.6567622821214627, 0.4333975642063024, 
# 0.21456404424395714, 0.5723555061358638, 0.0734819056269771, 
# 0.13208268856499217]

unique

The oldest wrapper in the Reusables library forces the output of a function to be unique, or else it will raise an exception or if specified, return an alternative output.

import reusables
import random


@reusables.unique(max_retries=100)
def poor_uuid():
    return random.randint(0, 10)


print([poor_uuid() for _ in range(10)])
# [8, 9, 6, 3, 0, 7, 2, 5, 4, 10]

print([poor_uuid() for _ in range(1000)])
# Exception: No result was unique

unique documentation

lock_it

A very simple wrapper to use while threading. Makes sure a function is only being run once at a time.

import reusables
import time


def func_one(_):
    time.sleep(5)


@reusables.lock_it()
def func_two(_):
    time.sleep(5)


@reusables.time_it(message="test_1 took {0:.2f} seconds")
def test_1():
    reusables.run_in_pool(func_one, (1, 2, 3), threaded=True)


@reusables.time_it(message="test_2 took {0:.2f} seconds")
def test_2():
    reusables.run_in_pool(func_two, (1, 2, 3), threaded=True)


test_1()
test_2()

# test_1 took 5.04 seconds
# test_2 took 15.07 seconds

log_exception

It’s good practice to catch and explicitly log exceptions, but sometimes it’s just easier to let it fail naturally at any point and log it for later refinement or debugging.

@reusables.log_exception()
def test():
    raise Exception("Bad")

# 2016-12-26 12:38:01,381 - reusables   ERROR  Exception in test - Bad
# Traceback (most recent call last):
#     File "<input>", line 1, in <module>
#     File "reusables\wrappers.py", line 200, in wrapper
#     raise err
# Exception: Bad

queue_it

Add the result of the function to the specified queue instead of returning it normally.

import reusables
import queue

my_queue = queue.Queue()


@reusables.queue_it(my_queue)
def func(a):
    return a


func(10)

print(my_queue.get())
# 10

New in 0.9

catch_it

Catch specified exceptions and return an alternative output or send it to an exception handler.

def handle_error(exception, func, *args, **kwargs):
    print(f"{func.__name__} raised {exception} when called with {args}")

@reusables.catch_it(handler=handle_error)
def will_raise(message="Hello"):
    raise Exception(message)


will_raise("Oh no!")
# will_raise raised Oh no! when called with ('Oh no!',)

retry_it

Once is never enough, keep trying until it works!

@reusables.retry_it(exceptions=(Exception, ), tries=10, wait=1)
def may_fail(dont_do=[]):
    dont_do.append("x")
    if len(dont_do) < 6:
        raise OSError("So silly")
    print("Much success!")

may_fail()
# Much success!

 

This post is a follow-up of Reusables – Part 1: Overview and File Management.

Python Decorators

Often times in python there comes a need to have multiple functions act in a similar manner. It could be anything from making sure that similar functions output the right type of result, they all log when they are called or all report exceptions in the same way.

An easy and repeatable way to accomplish this is decorators. They look like:

@decorator
def my_function(print_string="Hello World"):
    print(print_string)

my_function()
# Hello World

what that is actually doing is passing the lower function into another function called decorator. In other words it is exactly the same as doing:

def my_function(print_string="Hello World"):
    print(print_string)

decorator(my_function)("My message")
# My message

This post will go over the basics of how to create them, if you want to understand better how and why they work as they do, check out Guide to: Learning Python Decorators by Matt Harrison.

So what does a decorator look like?

Decorator Template

from functools import wraps

def decorator(func):
    """This is an example decorators"""
    @wraps(func)
    def container(*args, **kwargs):
        # perform actions before running contained function
        output = func(*args, **kwargs)
        # actions to run after contained function
        return output
    return container

Running the code in this sate would look exactly the same as if there were no decorator.

@decorator
def my_function():
    print "Hello World"

my_function()
# "Hello World"

line by line breakdown

def decorator(func):

The name of the decorator function, which takes the wrapped function as its single argument.

@wraps(func)

Wraps is a built-in method that replaces the decorators name and docstring with that of the wrapped function, the section after the line by line breakdown explains why this is necessary.

def container(*args, **kwargs):

This inner function collects the parameters and keyword parameters that are going to be passed to the original function. This allows the decorator access to incoming arguments to verify or modify before the function is ever run.

output = func(*args, **kwargs)

this runs the function with the original arguments and captures the output. This output could then be checked or changed before being returned, or even return an alternative output entirely.

return output

Don’t forget to actually return the original function’s output (or a custom one) or else the function will simply return None.

return container

The container function is the actual function being called,  hence why *args, **kwargs are passed to it. It is necessary to return it from the outside decorator so it can be called.

The importance of Wraps

We need to incorporate wraps so that the function name and docstring appear to be from the wrapped function, and not those of the wrapper itself.

@decorator
def my_func():
    """Example function"""
    return "Hello"

@decorator_no_wrap
def second_func():
    """Some awesome docstring"""
    return "World"

help(my_func)
# Help on function my_func:

# my_func()
#    Example function

help(second_func)
# Help on function container:

# container(*args, **kwargs)

It is possible, though more work to accomplish the same thing yourself.

def decorator(func):
    """This is an example decorators"""
    def container(*args, **kwargs):
        return func(*args, **kwargs)
    container.__name__ = func.__name__
    container.__doc__ = func.__doc__
    return container

Useful Example

Now lets turn it into something useful. In this example we will make sure that the function returns the expected type of result, or else it will raise an exception for us so there are not hidden complications down the road.

from functools import wraps

def isint(func):
         """ 
         This decorator will make sure the resulting value 
         is a integer or else it will raise an exception.
         """
    @wraps(func)
    def container(*args, **kwargs):
        output = func(*args, **kwargs)
        if not isinstance(output, int):
            raise TypeError("function did not return integer")
        return output
    return container

@isint
def add(num1, num2):
    """Add two numbers together and return the results"""
    return num1 + num2

print add(1, 2) 
# 3

print add("this", "that")
# Type Error: function did not return integer

Regular decorators are already called on execution, so you do not need to add ()s after their name, such as @isint. However, if the decorator accepts arguments, aka a meta-decorator, it will require () even if nothing additional is passed to it.

Meta-decorators

Meta-decorators can be created with either yet another function around the decorator, or built as a class.

from functools import wraps

def istype(instance_type=int):
    def decorator(func):
        """ 
        This decorator will make sure the resulting value is the 
        type specified or else it will raise an exception. 
        """
        @wraps(func)
        def container(*args, **kwargs):
            output = func(*args, **kwargs)
            if not isinstance(output, instance_type):
                raise TypeError("function did not return proper type")
            return output
        return container
    return decorator

@istype()
def add(num1, num2):
    """Add two numbers together and return the results"""
    return num1 + num2

@istype(str)
def reverse(forward_string):
    """Reverse and return incoming string"""
    return forward_string[::-1]


print add(1, 2) 
# 3

print reverse("Hello") 
# "olleH"

print add("this", "that")
# Type Error: function did not return proper type

Remember running a decorator is equivalent to:

decorator(my_function)("My message")

Running a meta-decorator adds an additional layer.

reversed_string = istype(str)(reverse)("Reverse Me")

Hence why @decorator doesn’t require to be called when put above a function, but @istype() does.

You can also create this meta-decorator as a class instead of another function.

class IsType: # Instead of 'def istype'

    def __init__(self, inc_type=int):
        self.inc_type = inc_type

    def __call__(self, func): # Replaces 'def decorator(func)'
        @wraps(func)
        def container(*args, **kwargs):
            output = func(*args, **kwargs)
            if not isinstance(output, self.inc_type):
                raise TypeError("function did not return proper type")
            return output
        return container

In functionality they are the same, but be aware they are technically different types. This will really only impact code inspectors and  those trying to manually navigate code, so it is not a huge issue, but is something to be aware of.

type(IsType)
<class 'type'>

type(istype)
<class 'function'>

 Things to keep in mind

  1. Order of operation does matter. If you have multiple decorators around a single function keep in mind they are run from top to bottom
  2. Once a function is wrapped, it cannot be run without the wrapper.
  3.  Meta-decorators require the additional parentheses, regular decorators do not.

This content is an updated version of what was original posted on my old personal site Cyber Curios, which has been shut down.

Reusables – Part 1: Overview and File Management

Reusables 0.8 has just been released, and it’s about time I give it a proper introduction.

I started this project three years ago, with a simple goal of keeping code that I inevitably end up reusing grouped into a single library. It’s for the stuff that’s too small to do well as it’s own library, but common enough it’s handy to reuse rather than rewrite each time.

It is designed to make the developer’s life easier in a number of ways. First, it requires no external modules, it’s possible to supplement some functionality with the modules specified in the requreiments.txt file, but are only required for specific use cases; for example: rarfile is only used to extract, you guessed it, rar files.

Second, everything is tested on both Python 2.6+ and Python 3.3+, also tested on pypy. It is cross platform compatible Windows/Linux, unless a specific function or class specifies otherwise.

Third, everything is documented via docstrings, so they are available at readthedocs, or through the built-in help() command in python.

Lastly, all functions and classes are all available at the root level (except CLI helpers), and can be broadly categorized as follows:

  • File Management
    • Functions that deal with file system operations.
  • Logging
    • Functions to help setup and modify logging capabilities.
  • Multiprocessing
    • Fast and dynamic multiprocessing or threading tools.
  • Web
    • Things related to dealing with networking, urls, downloading, serving, etc.
  • Wrappers
    • Function wrappers.
  • Namespace
    • Custom class to expand the usability of python dictionaries as objects.
  • DateTime
    • Custom datetime class primarily for easier formatting.
  • Browser Cookie Management
    • Find, extract or modify cookies of Firefox and Chrome on a system.
  • Command Line Helpers
    • Bash analogues to help system admins perform faster operations from inside an interactive python shell.

In this overview, we will cover:

  1. Installation
  2. Getting Started
  3. File, Folder and String Management
    1. Find Files Fast
    2. Archives (Extraction and Compression)
    3. Run Command
    4. File Hashing
    5. Finding Duplicate Files
    6. Safe File and Folder Names
    7. Touch (ing a file)
    8. Simple JSON and CSV
    9. Cut (ing a string into equal lengths)
    10. Config to dictionary

Installation

Very straightforward install, just do a simple pip or easy_install from PyPI.

pip install reusables

OR

easy_install reusables

If you need to install it on an offline computer, grab the appropriate Python 2.x or 3.x wheel from PyPI, and just pip install it directly.

There are no additional modules required for install, so if either of those don’t work, please open an issue at github.

Getting Started

import reusables 

reusables.add_stream_handler('reusables', level=10)

The logger’s name is ‘reusables’, and by default does not have any handlers associated with it. For these examples we will have logging on debug level, if you aren’t familiar with logging, please read my post about logging.

File, Folder and String Management

Everything here deals with managing something on the disk, or strings that relate to files. From checking for safe filenames to saving data files.

I’m going to start the show off with my most reused function, that is also one of the most versatile and powerful, find_files. It is basically an advanced implementation of os.walk.

Find Files Fast

reusables.find_files_list("F:\\Pictures",
                              ext=reusables.exts.pictures, 
                              name="sam", depth=3)

# ['F:\\Pictures\\Family\\SAM.JPG', 
# 'F:\\Pictures\\Family\\Family pictures - assorted\\Sam in 2009.jpg']

With a single line, we are able to search a directory for files by a case insensitive name, a list (or single string) of extensions and even specify a depth.  It’s also really fast, taking under five seconds to search through 70,000 files and 30,000 folders, taking just half a second longer than using the windows built in equivalent dir /s *sam* | findstr /i "\.jpg \.png \.jpeg \.gif \.bmp \.tif \.tiff \.ico \.mng \.tga \.xcf \.svg".

If you don’t need it as a list, use the generator itself.

for pic in reusables.find_files("F:\\Pictures", name="*chris *.jpg"):
    print(pic)

# F:\Pictures\Family\Family pictures - assorted\Chris 1st grade.jpg
# F:\Pictures\Family\Family pictures - assorted\Chris 6th grade.jpg
# F:\Pictures\Family\Family pictures - assorted\Chris at 3.jpg

That’s right, it also supports glob wildcards. It even supports using the external module scandir for older versions of Python that don’t have it nativity (only if enable_scandir=True is specified of course, its one of those supplemental modules). Check out the full documentation and more examples at readthedocs.

Archives

Dealing with the idiosyncrasies between the compression libraries provided by Python can be a real pain. I set out to make a super simple and straight forward way to archive and extract folders.

reusables.archive(['reusables',    # Folder with files 
                   'tests',        # Folder with subfolders
                   'AUTHORS.rst'], # Standalone file
                   name="my_archive.bz2")

# 'C:\Users\Me\Reusables\my_archive.bz2'

It will compress everything, store it, and keep folder structure in the archives.

To extract files, it is very similar behavior. Given a ‘wallpapers.zip’ file like this:

It is trivial to extract it to a location without having to specify it’s archive type.

reusables.extract("wallpapers.zip",
                  path="C:\\Users\\Me\\Desktop\\New Folder 6\\")
# ... DEBUG File wallpapers.zip detected as a zip file
# ... DEBUG Extracting files to C:\Users\Me\Desktop\New Folder 6\
# 'C:\\Users\\Me\\Desktop\\New Folder 6'

We can see that it extracted everything and again kept it’s folder structure.

The only support difference between the two is that you can extract rar files if you have installed rarfile and dependencies (and specified enable_rar=True), but cannot archive them due to licensing.

Run Command

Ok, so it many not always deal with the file system, but it’s better here than anywhere else. As you may or may not know, in Python 3.5 they released the excellent subprocess.run which is a convenient wrapper around Popen that returns a clean CompletedProcess class instance. reusables.run is designed to be a version agnostic clone, and will even directly run subprocess.run on Python 3.5 and higher.

reusables.run("cat setup.cfg", shell=True)

# CompletedProcess(args='cat setup.cfg', returncode=0, 
#                 stdout=b'[metadata]\ndescription-file = README.rst')

It does have a few subtle differences that I want to highlight:

  • By default, sets stdout and stderr to subprocess.PIPE, that way the result is always is in the returned CompletedProcess instance.
  • Has an additional copy_local_env argument, which will copy your current shell environment to the subprocess if True.
  • Timeout is accepted, buy will raise a NotImplimentedError if set on Python 2.x.
  • It doesn’t take positional Popen arguments, only keyword args (2.6 limitation).
  • It returns the same output as Popen, so on Python 2.x stdout and stderr are strings, and on 3.x they are bytes.

Here you can see an example of copy_local_env  in action running on Python 2.6.

import os

os.environ['MYVAR'] = 'Butterfly'

reusables.run("echo $MYVAR", copy_local_env=True, shell=True)

# CompletedProcess(args='echo $MYVAR', returncode=0, 
#                 stdout='Butterfly\n')

File Hashing

Python already has nice hashing capabilities through hashlib, but it’s a pain to rewrite the custom code for being able to handle large files without a large memory impact.  Consisting of opening a file and iterating over it in chunks and updating the hash. Instead, here is a convenient function.

reusables.file_hash("reusables\\reusables.py", hash_type="sha")

# '50c5425f9780d5adb60a137528b916011ed09b06'

By default it returns an md5 hash, but can be set to anything available on that system, and returns it in the hexdigest format, if the kwargs hex_digest is set to false, it will be returned as bytes.

reusables.file_hash("reusables\\reusables.py", hex_digest=False)

# b'4\xe6\x03zPs\xf5\xe9\x8dX\x9c/=/<\x94'

Starting with python 2.7.9, you can quickly view the available hashes directly from hashlib via hashlib.algorithms_available.

# CPython 3.6
import hashlib

print(f"{hashlib.algorithms_available}")
# {'sha3_256', 'MD4', 'sha512', 'sha3_512', 'DSA-SHA', 'md4', ...

reusables.file_hash("wallpapers.zip", "sha3_256")

# 'b7c357d582f8932977d785a24f728b267cef1de87537076aadac5049f4e4fa70'

Duplicate Files

You know you’ve seen this picture  before, you shouldn’t have to safe it again, where did that sucker go? Wonder no more, find it!

list(reusables.dup_finder("F:\\Pictures\\20131005_212718.jpg", 
                          directory="F:\\Pictures"))

# ['F:\\Pictures\\20131005_212718.jpg',
#  'F:\\Pictures\\Me\\20131005_212718.jpg',
#  'F:\\Pictures\\Personal Favorite\\20131005_212718.jpg']

dup_finder is a generator that will search for a given file at a directory, and all sub-directories. This is a very fast function, as it does a three step escalation to detect duplicates, if a step does not match, it will not continue with the other checks, they are verified in this order:

  1. File size
  2. First twenty bytes
  3. Full SHA256 compare

That is excellent for finding a single file, but how about all duplicates in a directory? The traditional option is to create a dictionary of hashes of all the files to compares against. It works, but is slow. Reusables has directory_duplicates function, which first does a file size comparison first, and only moves onto hash comparisons if the size matches.

reusables.directory_duplicates(".")

# [['.\\.git\\refs\\heads\\master', '.\\.git\\refs\\tags\\0.5.2'], 
#  ['.\\test\\empty', '.\\test\\fake_dir']]

It returns a list of lists, each internal list is a group of matching files.  (To be clear “empty” and “fake_dir” are both empty files used for testing.)

Just how much faster is it this way? Here’s a benchmark on my system of searching through over sixty-six thousand (66,000)  files in thirty thousand (30,000) directories.

The comparison code (the Reusables duplicate finder is refereed to as ‘size map’)

import reusables

@reusables.time_it(message="hash map took {seconds:.2f} seconds")
def hash_map(directory):
    hashes = {}
    for file in reusables.find_files(directory):
        file_hash = reusables.file_hash(file)
        hashes.setdefault(file_hash, []).append(file)

    return [v for v in hashes.values() if len(v) > 1]


@reusables.time_it(message="size map took {seconds:.2f} seconds")
def size_map(directory):
    return reusables.directory_duplicates(directory)


if __name__ == '__main__':
    directory = "F:\\Pictures"

    size_map_run = size_map(directory)
    print(f"size map returned {len(size_map_run)} duplicates")

    hash_map_run = hash_map(directory)
    print(f"hash map returned {len(hash_map_run)} duplicates")

The speed up of checking size first in our scenario is significant, over 16 times faster.

size map took 40.23 seconds
size map returned 3511 duplicates

hash map took 642.68 seconds
hash map returned 3511 duplicates

It jumps from under a minute for using reusables.directory_duplicates to over ten minutes when using a traditional hash map. This is the fastest pure Python method I have found, if you find faster, let me know!

Safe File Names

There are plenty of instances that you want to save a meaningful filename supplied by a user, say for a file transfer program or web upload service, but what if they are trying to crash your system?

Reusables has three functions to help you out.

  • check_filename: returns true if safe to use, else false
  • safe_filename: returns a pruned filename
  • safe_path: returns a safe path

These are designed not off of all legally allowed characters per system, but a restricted set of letters, numbers, spaces, hyphens, underscores and periods.

reusables.check_filename("safeFile?.text")
# False

reusables.safe_filename("safeFile?.txt")
# 'safeFile_.txt'

reusables.safe_path("C:\\test'\\%my_file%\\;'1 OR 1\\filename.txt")
# 'C:\\test_\\_my_file_\\__1 OR 1\\filename.txt'

Touch

Designed to be same as Linux touch command. It will create the file if it does not exist, and updates the access and modified times to now.

time.time()
# 1484450442.2250443

reusables.touch("new_file")

os.path.getmtime("new_file")
# 1484450443.804158

Simple JSON and CSV save and restore

These are already super simple to implement in pure python with the standard library, and are just here for convince of not having to remember conventions.

List of lists to CSV file and back

my_list = [["Name", "Location"],
           ["Chris", "South Pole"],
           ["Harry", "Depth of Winter"],
           ["Bob", "Skull"]]

reusables.list_to_csv(my_list, "example.csv")

# example.csv
#
# "Name","Location"
# "Chris","South Pole"
# "Harry","Depth of Winter"
# "Bob","Skull"


reusables.csv_to_list("example.csv")

# [['Name', 'Location'], ['Chris', 'South Pole'], ['Harry', 'Depth of Winter'], ['Bob', 'Skull']]

Save JSON with default indent of 4

my_dict = {"key_1": "val_1",
           "key_for_dict": {"sub_dict_key": 8}}

reusables.save_json(my_dict,"example.json")

# example.json
# 
# {
#     "key_1": "val_1",
#     "key_for_dict": {
#         "sub_dict_key": 8
#     }
# }

reusables.load_json("example.json")

# {'key_1': 'val_1', 'key_for_dict': {'sub_dict_key': 8}}

Cut a string into equal lengths

Ok, I admit, this one has absolutely nothing to do with the file system, but it’s just to handy to not mention right now (and doesn’t really fit anywhere else). One of the features I was most surprised to not be included in the standard library was to a have a function that could cut strings into even sections.

I haven’t seen any PEPs about it either way, but I wouldn’t be surprised if one of the reasons is ‘why do to with leftover characters?’. Instead of forcing you to stick with one, Reusables has four different ways it can behave for your requirement.

By default, it will simply cut everything into even segments, and not worry if the last one has matching length.

reusables.cut("abcdefghi")
# ['ab', 'cd', 'ef', 'gh', 'i']

The other options are to remove it entirely, combine it into the previous grouping (still uneven but now last item is longer than rest instead of shorter) or raise an IndexError exception.

reusables.cut("abcdefghi", 2, "remove")
# ['ab', 'cd', 'ef', 'gh']

reusables.cut("abcdefghi", 2, "combine")
# ['ab', 'cd', 'ef', 'ghi']

reusables.cut("abcdefghi", 2, "error")
# Traceback (most recent call last):
#     ...
# IndexError: String of length 9 not divisible by 2 to splice

Config to Dictionary

Everybody and their co-worker has written a ‘better’ config file handler of some sort, this isn’t trying to add to that pile, I swear. This is simply a very quick converter using the built in parser directly to dictionary format, or to a python object  I call a Namespace (more on that in future post.)

Just to make clear, this only reads configs, not writes any changes. So given an example config.ini file:

[General]
example=A regular string

[Section 2]
my_bool=yes
anint=234
exampleList=234,123,234,543
floatly=4.4

It reads it as is into a dictionary. Notice there is no automatic parsing or anything fancy going on at all.

reusables.config_dict("config.ini")
# {'General': {'example': 'A regular string'},
#  'Section 2': {'anint': '234',
#                'examplelist': '234,123,234,543',
#                'floatly': '4.4',
#                'my_bool': 'yes'}}

You can also take it into a ConfigNamespace.

config = reusables.config_namespace("config.ini")
# <ConfigNamespace: {'General': {'example': 'A regular string'}, 'Section 2': ...

Namespaces are special dictionaries that allow for dot notation, similar to Bunch but recursively convert dictionaries into Namespaces.

config.General
# <ConfigNamespace: {'example': 'A regular string'}>

ConfigNamespace has handy built-in type specific retrieval.  Notice that dot notation will not work if item have spaces in them, but the regular dictionary key notation works as well.

config['Section 2'].bool("my_bool")
# True

config['Section 2'].bool("bool_that_doesn't_exist", default=False)
# False
# If no default specified, will raise AttributeError

config['Section 2'].float('floatly')
# 4.4

It supports booleans, floats, ints, and unlike the default config parser, lists. Which even accepts a modifier function.

config['Section 2'].list('examplelist', mod=int)
# [234, 123, 234, 543]

Finale

That’s all for this first overview,. hope you found something useful and will make your life easier!

Related links:

Exception! Exception! Read all about it!

Exceptions in Python aren’t new, and pretty easy to wrap your head around. Which means you should have no trouble following what this example will output.

import warnings

def bad_func():
    try:
        assert False and True or False
    except AssertionError as outer_err:
        try:
            assert None is False
        except AssertionError as inner_err:
            warnings.warn("None is not False", UserWarning)
            raise inner_err from None
        else:
            return None
    else:
        return True
    finally:
        return False


print(bad_func())

Simple, right?  It raises both except clauses, triggering the warning and so your screen will obviously print False and the  warning.

False
C:/python_file.py:15: UserWarning: None is not False
  warnings.warn("None is not False", UserWarning)

So if you followed that without issue and understand why its happening, go ahead and check out another post, because today we are diving into the depths of Python 3 exception handling.

The basic try except block

If you have coded any semi-resilient Python code, you have used at least the basic try except.

try:
    something()
except:
    pass

If something breaks in the try block, the code in the except block is execute instead of allowing the exception to be raised. However the example code above does something unforgivable, it captures everything. “But that’s what I want it to do!” No, you do not.

Catching everything includes both KeyboardInterrupt and SystemExit. Those Exceptions are outside the normal Exception scope. They are special, because they are cased from external sources. Everything else is caused by something in your code.

That means if you have the following code, it would only exit if it had a SIGKILL sent to it, instead of the regular, and more polite SIGTERM or a user pressing Ctrl+C .

while True:
    try:
        something()
    except:
        pass

Now it’s fine, and even recommend to catch KeyboardInterrupt and SystemExit as needed. A common use case is to do a ‘soft’ exit, where your program tries to finish up any working tasks before exiting. However, in any case you use it, it’s better to be explicit. There should almost always be a separate path for internal vs external errors.

try:
    something()
except Exception as err:
    print(err)
    raise  # A blank except rasies the last exception
except (SystemError, KeyboardInterrupt) as err:
    print('Shutting down safely, please wait until process exits')
    soft_exit()

This way it is clear to the user when either an internal error happened or if it was a user generated event. This is usually only used on the very external event loop or function of the program. For most of your code, the bare minimum you should use in a try except block is:

try:
    something()
except Exception as err:
    print(err)

That way you are capturing all internal exceptions, but none of the system level ones. Also very important, adding a message (in this simple example a print line) will help show what the error was. However, str(err) will only output the error message, not the error type or traceback. We will cover how to get the most of the exception next.

Error messages, custom exceptions and scope

So why does Exception catch everything but the external errors?

Scope

The Exception class is inherited by all the other internal errors. Whereas something like FileNotFoundError is at the lowest point on the totem pole of built-in exception hierarchy. That means it’s as specific as possible, and if you catch that error, anything else will still be raised. 

| Exception
+--| OSError
   +-- FileNotFoundError
   +-- PermissionError

So the further –> you go, the less encapsulation and more specific of error you get.

try:
    raise FileNotFoundError("These aren't the driods you're looking for")
except FileNotFoundError:
    print("It's locked, let's try the next one")
except OSError:
    print("We've got em")

# Output: It's locked, let's try the next one

It works as expected, the exception is caught, it’s message is not printed, rather the code in the except FileNotFoundError is executed. Now keep in mind that other exceptions at the same level will not catch errors on the same level.

try:
    raise FileNotFoundError("These aren't the driods you're looking for")
except PermissionError:
    print("It's locked, let's try the next one")
except OSError:
    print("We've got em")

# Output: We've got em

Only that specific error or it’s parent (or parent’s parent, etc..) will work to capture it. Also keep in mind, with try except blocks, order of operations matter, it won’t sort through all the possible exceptions, it will execute only the first matching block.

try:
    raise FileNotFoundError("These aren't the driods you're looking for")
except OSError:
    print("We've got em")
except FileNotFoundError as err:
    print(err)

# Output: We've got em

That means even though FileNotFoundError was a more precise match, it won’t be executed if an except block above it also matches.

In practice, it’s best to capture the known possible lowest level (highest precision) exceptions. That is because if you have a known common failure, there is generally a different path you want to take instead of complete failure. We will use that in this example, where we want to create a file, but don’t want to overwrite existing ones.

index = 0 
while True:
    index += 1
    try:
        create_a_file(f'file_{index}')
        break
    except FileExistsError:
        continue
    except Exception as err:
        print(f"Unexpected error occurred while creating a file: {err}")

Here you can see, if that file already exists, and create_a_file raises a FileExistsError the loop will increase the index, and try again for file_2 and that will continue until a file doesn’t exist and it break out of the loop.

Custom Exceptions

What if you have code that needs to raise an exception under certain conditions? Well, it’s good to use the built-ins only when exactly applicable,  otherwise it is better to create your own. And you should always start with creating that error off of Exception.

class CustomException(Exception):
    """Something went wrong in my program"""

raise CustomException("It's dead Jim")

All user-defined exceptions should also be derived from [Exception] -Python Standard Library

Don’t be fooled into using the more appropriate sounding BaseException, that is the parent of all exceptions, including SystemExit.

You can of course create your own hierarchical system of exceptions, which will behave the same way as the built-in.

class CustomException(Exception):
    """Something went wrong in my program"""

class FileRelatedError(CustomException):
    """Couldn't access or update a file"""

class IORelatedError(CustomException):
    """Couldn't read from it"""


try:
    raise FileRelatedError("That file should NOT be there")
except CustomException as err:
    print(err)

# Output: That file should NOT be there

Traceback and Messages

So far we have been printing messages in the except blocks. Which is fine for small scripts, and I highly recommended including instead of just pass (Unless that is a planned safe route). However, if there is an exception, you probably want to know a bit more. That’s where the traceback comes in, it lets you know some of the most recent calls leading to the exception.

import traceback
import reusables

try:
    reusables.extract('not_a_real_file')
except Exception as err:
    traceback.print_exc()

Using the built-in traceback module, we can use traceback.print_exc() to directly print the traceback to the screen. (Use my_var = traceback.format_exc() if you plan to capture it to a variable instead of printing it.)

Traceback (most recent call last):
  File "C:\python_file.py", line 11, in <module>
    reusables.extract('not_a_real_file')
  File "C:\reusables.py", line 594, in extract
    raise OSError("File does not exist or has zero size")
OSError: File does not exist or has zero size

This becomes a lifesaver when trying to figure out what went wrong where. Here we can see what line of our code executed to cause the exception reusables.extract and where in that library the exception was raised. That way, if we need to dig deeper, or it’s a more complex issue, everything is nicely laid out for us.

“So what, it prints to my screen when it errors out and exits my program.” True, but what if you don’t want your entire program to fail over a trivial section of code, and want to be able to go back and figure out what erred. An even better example of this is with logging.

The standard library logging module logger has a beautiful feature built-in, .exception. Where it will log the full traceback of the last exception.

import logging
log = logging.getLogger(__name__)

try:
    assert False
except Exception as err:
    log.exception("Could not assert to True")

This will log the usual message you provide at the ERROR level, but will then include the full traceback of the last exception. Extremely valuable for services or background programs.

2017-01-20 22:28:34,784 - __main__      ERROR    Could not assert to True
Traceback (most recent call last):
  File "C:/python_file.py", line 12, in <module>
    assert False
AssertionError

Traceback and logging both have a lot of wonderful abilities that I won’t go into detail with here, but make sure to take the time to look into them.

Finally

try except blocks actually have two additional optional clauses, else and finally sections. Finally is very straightforward, no matter what happens in the rest of try except, that code WILL execute. It supersedes all other returns, it will suppress other block’s raises if it has it’s own raise or return.  (The only case when it won’t run is because the program is terminated during it’s execution.)

That’s why in the very top example code block, even though there is a raise being called, the False is still returned.

    finally:
        return False

The Finally section is invaluable when you are dealing with stuff that is critical to handle for clean up.  For example, safely closing database connections.

try:
    something()
except CustomExpectedError:
    relaunch_program()
except Exception:
    log.exception("Something broke")
except (SystemError, KeyboardInterrupt):
    wait_for_db_transactions()
finally:
    stop_db_transactions()
    db_commit()
    db_close()

Now that is a rather beautiful try except, it catches lowest case first, will log unexpected errors, has a soft_exit if the user presses Ctrl+C, and will safely close the database connection not matter what, even if Ctrl+C is pressed again while wait_for_db_transactions() is being run. (Notice this isn’t a perfect example, as it will still wait to close the database connection after relauch_program is ended, which means that function should do something like fork off so the finally block isn’t wanting  forever to run.)

What Else?

The final possible section of the try except is else, which is the optional success path. Meaning that code will only run if no exceptions are raised. (Remember that code in the finally section will still run as well if present.)

try:
    a, b, *c = "123,456,789,10,11,12".split(",")
except ValueError:
    a, b, c = '1', '2', '3'
else:
    print(",".join(num for num in a))

# Output: 1,2,3

This helps mainly with flow control in the program. For example, if you have a function that tries multiple ways to interpret a string, you could add an else clause to each of the try except blocks that returns on success.  (Dear goodness don’t actually use this code it’s just an example for flow.)

def what_is_it(incoming):
    try:
        float(incoming)
    except ValueError:
        pass  # This is fine as it's a precise, expected exception
    else:
        return 'number'

    try:
        len(str(incoming).split(",")) > 1
    except Exception:
        traceback.print_exc() # Unsure of possible errors, find out
    else:
        return 'list'

From What?

As you might have noticed in the initial example, there  is a line:

raise inner_err from None

This is new in Python 3 and from my experience, hardly ever used, even though very useful. It sets the context of the exception. Ever seen a page log exception with plenty of During handling of the above exception, another exception occurred: 

This can set it to an exact exception above it, so it only displays the relevant ones.

try:
    [] + {}
except TypeError as type_err:
    try:
        assert False
    except AssertionError as assert_err:
        try:
            "" + []
        except TypeError as type_two_err:
            raise type_two_err from type_err

Without the from the above code would show all three tracebacks, but by setting the last exception’s context to the first one, we get only those two, with a new message between them.

Traceback (most recent call last):
  File "C:/python_file.py", line 10, in <module>
    [] + {}
TypeError: can only concatenate list (not "dict") to list

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:/python_file.py", line 18, in <module>
    raise type_two_err from type_err
  File "C:/python_file.py", line 16, in <module>
    "" + []
TypeError: Can't convert 'list' object to str implicitly

Pretty handy stuff, and setting it to from None will remove all other exceptions from the context, so it only displays that exception’s traceback.

try:
    [] + {}
except TypeError as type_err:
    try:
        assert False
    except AssertionError as assert_err:
        try:
            "" + []
        except TypeError as type_two_err:
            raise type_two_err from None
Traceback (most recent call last):
  File "C:/Users/Chris/PycharmProjects/Reusables/ww.py", line 18, in <module>
    raise type_two_err from None
  File "C:/Users/Chris/PycharmProjects/Reusables/ww.py", line 16, in <module>
    "" + []
TypeError: Can't convert 'list' object to str implicitly

Now you and the users of the program know where those exceptions are raised from. 

Warnings

Another often neglected part of the python standard library is warnings. I will also not delve too deeply into it, but at least I’m trying to raise awareness.

I find the most useful feature of warnings as run once reminders. For example, if you have some code that you think might change in the next release, you can put:

import warnings

def my_func():
    warnings.warn("my_func is changing it's name to master_func",
                  FutureWarning)
    master_func()

This will print the warning to stderr, but only the first time by default. So if someone’s code is using this function a thousand times, it doesn’t blast their terminal with warning messages.

To learn more about them, check out Python Module of the Week‘s post on it.

Best Practices

  1. Never ever have a blank except, always catch Exception or more precise exceptions for internal errors, and have a separate except block for external errors.
  2. Only have the code that may cause a precise error in the try block. It’s better to have multiple try except blocks with limited scope rather than a long section of code in the try block with a generic Exception.
  3. Have a meaningful message to users (or yourself) in except blocks.
  4. Create your own exceptions instead of just raising Exception.

 

The Documentation Dilemma

Every coder will come across the same aggravating issue one day, finding a script or library you think you need, and it has zero instructions. Or the examples don’t work, or there is so much writing and so little substance you can’t hope to dig through it all to discover how it’s meant to be used.

Don’t be like that project, document your code smartly. And document to the level of project you are writing. If you have a little script that’s self-contained as a single file, it would be stupid to add a documentation directory and set up Sphinx. Whereas if you were creating a library for other coders, it would be stupid if you didn’t have extensive documentation.

Acceptable Documentation

I used to be like every other college grad, “your code should document itself”, “An idiot could decipher my code”, “if they can’t bother reading through the code, why would they run it on their system?” That is the sound of inexperience and ignorance. For your code to be useful to other people, it needs at minimum to describe what it does and have expected input and outputs, which is easily shown in examples.

The reality is,  professionals simply do a quick search (either company internal or web) and determine if a program or script already exists to do what they want, by reading a short blurb about them and moving on. No one stops to read the code, don’t do your project the disservice of hiding it from others that would find it useful. Not only does it help others, it’s a much better self promoter, showing that you care about your own work and want others to use it.

Here is a quick breakdown of what I view as requirements of documentation:

Single script

  • Readme: can be the docstring at top of file
    • Description of what the script is designed to do
    • Examples, including a sample run with common input / output
  • Code should have meaningful messages on failure

Multiple scripts

  • Readme: standalone file
    • Description of project as a whole
    • Description of each script
    • Example of each script
    • Description and examples of any interconnection between scripts
  • Code needs meaningful messages on failure

Library

  • Readme: standalone file
    • Description of project
    • Installation guide
    • Basic examples
  • Changelog
  • License
  • Documentation directory
    • Stored in a human readable markup language (.md, .rst, .latex) that can be generated into PDF or HTML
    • Python projects should use Sphinx that pulls in docstrings
    • Index with contents of Readme, links to other files for each aspect of library
  • Code should use logging

Library with API

  • Everything in Library
  • API documents for every interface
    • preferably in a common format such as RAML

Program

  • Readme: standalone file
    • Description of project
    • Installation guide
    • Tutorials
      • Including detailed examples / use cases
    • FAQ / troubleshooting
  • License
  • Code should use logging
  • Changelog

Obviously I can’t go into every possible scenario, but I hope this clears up what is a common expectation of a project. And if you intend for others to help with the project, it’s a good idea to include developer docs for contributors that go over code conventions and any particular styles you want to adhere too. If other people are helping, make sure to have a section dedicated thanks to them, such as an Authors file.

Writing the Documentation

Here’s a quick baseline of a README in Markdown you can feel free to copy and reuse.

Project Name
============

Description
-----------

The is an example project that is designed 
to teach documentation. The project will search
imdb and find a movie based off the input given.

Supported Systems:

* Windows XP or newer
* Linux 
 
Tested on:

* Python 2.6+
* Python 3.3+
* Pypy


Examples
--------

Find the movie

```
$ python find_imdb_movie.py 300 

Title: 300 (2006)
Summary: King Leonidas of Sparta and a
         force of 300 men fight the Persians at 
         Thermopylae in 480 B.C.
Rating: 7.7 / 10
```

Get more details about it

```
$ python find_imdb_movie.py --director 300 

Title: 300 (2006)
Director: Zack Snyder
```

License
-------

This code is MIT Licensed, see LICENSE 

If you have a script, and it already has good error messages, you’re good to go! Feel free to add more example cases, FAQ or the full usage output (aka the details given when do you my_project -h, and yes you need that too.)

For larger projects, this is not enough. As it would be very difficult to fit enough examples onto a single readable page, and very easy for them to go stale. To avoid that issue in code libraries, I recommend including the examples of usage in the Python docstrings themselves, and have them auto included in the documentation with Sphinx. That way, when you update a function, you’re much more likely to remember to update the example beside it.

def now(utc=False, tz=None):
"""
Get a current DateTime object. By default is local.

.. code:: python

reusables.now()
# DateTime(2016, 12, 8, 22, 5, 2, 517000)

reusables.now().format("It's {24-hour}:{min}")
# "It's 22:05"

:param utc: bool, default False, UTC time not local
:param tz: TimeZone as specified by the datetime module
:return: reusables.DateTime
"""

Here’s what that will look like when rendered with Sphinx (in this case hosted with Read the Docs )

Reusables.now() documentation example

This is very convenient if you have a library and are exposing all your functions, but isn’t an option for full-fledged programs or APIs. I have seen a few fancy tricks, such as pulling the examples out of the Readme’s and using them as functional tests, or parsing the RAML API files and testing them.  It’s a good way to ensure that all the examples are up to date, but now the documentation is dual purpose, and has to be both machine and human readable, possibly taking too much away from the human readability for the good that the tests do to make sure they are up to date.

So what’s the answer? Well, there isn’t just one. If it’s just you and you’re not going to be updating the project that often, write the examples when it is near completion and try your best to update them if the project updates. If it’s an ever evolving project with multiple contributors, it would probably be best to have some sort of automation to ensure they are not going stale.

The best thing to do is have a standard across your team / company. Even a stupid standard implemented well can result in better documentation than no standards. That way everyone knows exactly where to look for docs, and what they are expected to contribute.