tutorial

Creating a successful project – Part 3: Development Tools/Equipment

Every single year that I’ve been doing this, I hear about the next “totally awesome” way to write code.  And more often than not, the new thing is certainly very shiny.

When it comes to projects, with the exception of coding standards (which will be part 4 of this series) I am not a fan of telling developers how to write code.  If you’ve got someone who likes to write code using Notepad on a Microsoft Windows machine, more power to them.  Oh, you like coding in SublimeText3 on Mac – go for it.

If you work on one of my projects there are only a few rules I have about how you write your code:

  1. It must maintain the agreed-upon standard (such as PEP8)
  2. Your code – under penalty of my ire – must work on the designated system.  If it WFM, “Works for Me” then you must get it working on the chosen system. (More on this topic in the test and build posts) And trust me, there’s plenty of people out there – including other contributors to this site – who would shudder to think of my ire directed singly upon them.
  3. Use whatever the agreed upon (preferably Git) source code control system.
  4. Use whatever build system is in play.  Usually, this is done via a Jenkins server, but I’m not picky.  I want consistency, and I want to make sure that the output of the project is reliable.  More on build systems in the CI/CD section.

Notice something odd in there: nowhere did I say you had to use this particular editor or debugger.  I honestly couldn’t care less if you like to write your code using Comic Sans or SourceCodePro.  I really don’t care if you like to code using EMACS or Sublime.  The tools one uses to write code should be selected through a similar vetting process to purchasing a good chef’s knife: use what you feel most comfortable using.

But, in the interest of showing what a rather more seasoned coder uses, here’s my setup:

Keyboard – Microsoft Natural Ergonomic Keyboard – I spend 8-16 hours a day on a keyboard, so I want my keyboard to be comfortable and able to handle heavy use.  The good thing (besides that this is a great keyboard) they’re nice and cheap.  So when one dies, I just buy another.

Mouse – ROCCAT Kone Pure Color – This is just a really great mouse.

Editor- Vim or, as of recent Neovim – I’ve used Vi/Vim for decades so I’m a bit of an old hat at using them.

Operating System – Debian Linux – When you want the best and you don’t want extra crap getting in your way; accept only the best.

I use that same setup at work as well as home.  I am not endorsed by any of the product manufacturers; I just know what works for me.  If I find a keyboard in the same form-factor as the one I’m using with Cherry MX Browns, I’ll buy two of them in a heartbeat.

I have also made use of PyCharm and Atom.  Both of which I still use with Vim Keybindings.

 

Creating a successful project – Part 2: Project Organization

Ok, so you’ve familiarized yourself with the rules.  Good.  This isn’t Fight Club, but they are important rules. You want to be a professional in all the projects on which you want to participate.  If properly managed, your GitHub account should be a functional part of your ever-evolving resume.

Code organization:

So, code itself really isn’t hard to organize.  But you’d be surprised how many people just don’t get it.  Since I usually develop in python, here is my standard project directory structure:

./new_project.
 ├── CONTRIBUTING.md
 ├── doc
 ├── LICENSE.md
 ├── README.md
 ├── scripts
 ├── setup.py
 ├── src
 └── tests

If you code in a different language which has different structural requirements, make changes.  This is how I do it.

First, the markdown files:

I’m not going to worry over the preference for ReStructuredText, Markdown, LaTeX, or basic text for these files.  Feel free to do your own research and come to your own conclusions.

CONTRIBUTING.md

This document is optional for simple projects.  It is possible to have this as part of the README.  In this document, you should spell out how people can contribute to your project.  Should they fork and submit pull requests?  How should they report issues and/or request new features?  These are important questions.  Not just for potential compatriots, but for you.

LICENSE.md

This should have the text of the license under which you are placing the project.  You can cut and paste this from any number of reputable licensing sources.  I recommend reading about the different licenses here. I will also be going over the primary players in a future post.

README.md

I feel that the project README is non-optional.  This single document should have key information about the project.  It should (ideally) address the following:

  1. What is the purpose of this project?  Think of it as the two-to-three sentence description of the project.  What problem were you trying to solve?
  2. Where to find more documentation.
  3. Installation instructions
  4. Simple configuration settings – if appropriate.
  5. Possibly simple use-cases and examples.

Optional Docs:

DONATIONS.md/SPONSOR.md – Where/how to send any monetary donations.  A good example of what might be found in a donations/sponsorship doc can be found here.

THANKS.md – If you want to thank anyone for help or encouragement, this would be alright to add.  Completely optional.

Now the directories:

doc

This is where your project documentation should reside.  If you do this correctly, you can easily get them posted to readthedocs.org.

scripts

This is optional for projects which have no need for command-line scripts.  But, if your project has a command-line tool associated with it, you should really consider having them in this directory.

src

This is where your code should reside.

tests

This is where the tests for your code should be found.  Having tests is a big indicator of the health and maturity of the project.  The tests here should show that you take the writing and maintaining of the code very seriously.

Optional Directories:

examples – Some people like to house some usage examples here.  If appropriate, feel free.

 

 

 

Creating a successful project – Part 1: The Rules

So, you want to create a project.  That’s great!  Welcome to the party.  I’d like to pass along some general knowledge about how best to approach creating/running a project.  Also, I’m not going to say that I know everything there is to know about running a successful project.  I have run a few projects in my time, some successful, some less so;  I also have some hard-fought lessons learned that I am going to share.  Perhaps you’ll use some of this knowledge to improve your projects.

One side note on what I am teaching/advocating here:  I am not telling you some secret to making your project the next docker or requests.  I’m showing you best practices for success.  These ideas and concepts are good for any size or purpose of a software project.  They have been successfully used on single-person projects all the way up to projects with 70+ contributors.  Good ideas and buy-in from your team are critical to success, but they are never going to be enough to replace plain old elbow grease to write the code, docs, and tests for a project.  That stuff takes discipline and time.

General Project Structure Rules

  1. If you want you (and your project) to be taken seriously?  Be organized.
  2. Tests are non-optional.
  3. Documentation is also non-optional.
  4. Documentation should read like it was written by someone with knowledge who cares about informing a newcomer to the project.
  5. You should understand contributor roles if any.
  6. You MUST understand the licenses that are in regular use and decide which license to use.  This is more about long-term CYA than anything else.

Well, there’s the first entry in this series.  I hope people find it interesting.

 

Reusables – Part 1: Overview and File Management

Reusables 0.8 has just been released, and it’s about time I give it a proper introduction.

I started this project three years ago, with a simple goal of keeping code that I inevitably end up reusing grouped into a single library. It’s for the stuff that’s too small to do well as it’s own library, but common enough it’s handy to reuse rather than rewrite each time.

It is designed to make the developer’s life easier in a number of ways. First, it requires no external modules, it’s possible to supplement some functionality with the modules specified in the requreiments.txt file, but are only required for specific use cases; for example: rarfile is only used to extract, you guessed it, rar files.

Second, everything is tested on both Python 2.6+ and Python 3.3+, also tested on pypy. It is cross platform compatible Windows/Linux, unless a specific function or class specifies otherwise.

Third, everything is documented via docstrings, so they are available at readthedocs, or through the built-in help() command in python.

Lastly, all functions and classes are all available at the root level (except CLI helpers), and can be broadly categorized as follows:

  • File Management
    • Functions that deal with file system operations.
  • Logging
    • Functions to help setup and modify logging capabilities.
  • Multiprocessing
    • Fast and dynamic multiprocessing or threading tools.
  • Web
    • Things related to dealing with networking, urls, downloading, serving, etc.
  • Wrappers
    • Function wrappers.
  • Namespace
    • Custom class to expand the usability of python dictionaries as objects.
  • DateTime
    • Custom datetime class primarily for easier formatting.
  • Browser Cookie Management
    • Find, extract or modify cookies of Firefox and Chrome on a system.
  • Command Line Helpers
    • Bash analogues to help system admins perform faster operations from inside an interactive python shell.

In this overview, we will cover:

  1. Installation
  2. Getting Started
  3. File, Folder and String Management
    1. Find Files Fast
    2. Archives (Extraction and Compression)
    3. Run Command
    4. File Hashing
    5. Finding Duplicate Files
    6. Safe File and Folder Names
    7. Touch (ing a file)
    8. Simple JSON and CSV
    9. Cut (ing a string into equal lengths)
    10. Config to dictionary

Installation

Very straightforward install, just do a simple pip or easy_install from PyPI.

pip install reusables

OR

easy_install reusables

If you need to install it on an offline computer, grab the appropriate Python 2.x or 3.x wheel from PyPI, and just pip install it directly.

There are no additional modules required for install, so if either of those don’t work, please open an issue at github.

Getting Started

import reusables 

reusables.add_stream_handler('reusables', level=10)

The logger’s name is ‘reusables’, and by default does not have any handlers associated with it. For these examples we will have logging on debug level, if you aren’t familiar with logging, please read my post about logging.

File, Folder and String Management

Everything here deals with managing something on the disk, or strings that relate to files. From checking for safe filenames to saving data files.

I’m going to start the show off with my most reused function, that is also one of the most versatile and powerful, find_files. It is basically an advanced implementation of os.walk.

Find Files Fast

reusables.find_files_list("F:\\Pictures",
                              ext=reusables.exts.pictures, 
                              name="sam", depth=3)

# ['F:\\Pictures\\Family\\SAM.JPG', 
# 'F:\\Pictures\\Family\\Family pictures - assorted\\Sam in 2009.jpg']

With a single line, we are able to search a directory for files by a case insensitive name, a list (or single string) of extensions and even specify a depth.  It’s also really fast, taking under five seconds to search through 70,000 files and 30,000 folders, taking just half a second longer than using the windows built in equivalent dir /s *sam* | findstr /i "\.jpg \.png \.jpeg \.gif \.bmp \.tif \.tiff \.ico \.mng \.tga \.xcf \.svg".

If you don’t need it as a list, use the generator itself.

for pic in reusables.find_files("F:\\Pictures", name="*chris *.jpg"):
    print(pic)

# F:\Pictures\Family\Family pictures - assorted\Chris 1st grade.jpg
# F:\Pictures\Family\Family pictures - assorted\Chris 6th grade.jpg
# F:\Pictures\Family\Family pictures - assorted\Chris at 3.jpg

That’s right, it also supports glob wildcards. It even supports using the external module scandir for older versions of Python that don’t have it nativity (only if enable_scandir=True is specified of course, its one of those supplemental modules). Check out the full documentation and more examples at readthedocs.

Archives

Dealing with the idiosyncrasies between the compression libraries provided by Python can be a real pain. I set out to make a super simple and straight forward way to archive and extract folders.

reusables.archive(['reusables',    # Folder with files 
                   'tests',        # Folder with subfolders
                   'AUTHORS.rst'], # Standalone file
                   name="my_archive.bz2")

# 'C:\Users\Me\Reusables\my_archive.bz2'

It will compress everything, store it, and keep folder structure in the archives.

To extract files, it is very similar behavior. Given a ‘wallpapers.zip’ file like this:

It is trivial to extract it to a location without having to specify it’s archive type.

reusables.extract("wallpapers.zip",
                  path="C:\\Users\\Me\\Desktop\\New Folder 6\\")
# ... DEBUG File wallpapers.zip detected as a zip file
# ... DEBUG Extracting files to C:\Users\Me\Desktop\New Folder 6\
# 'C:\\Users\\Me\\Desktop\\New Folder 6'

We can see that it extracted everything and again kept it’s folder structure.

The only support difference between the two is that you can extract rar files if you have installed rarfile and dependencies (and specified enable_rar=True), but cannot archive them due to licensing.

Run Command

Ok, so it many not always deal with the file system, but it’s better here than anywhere else. As you may or may not know, in Python 3.5 they released the excellent subprocess.run which is a convenient wrapper around Popen that returns a clean CompletedProcess class instance. reusables.run is designed to be a version agnostic clone, and will even directly run subprocess.run on Python 3.5 and higher.

reusables.run("cat setup.cfg", shell=True)

# CompletedProcess(args='cat setup.cfg', returncode=0, 
#                 stdout=b'[metadata]\ndescription-file = README.rst')

It does have a few subtle differences that I want to highlight:

  • By default, sets stdout and stderr to subprocess.PIPE, that way the result is always is in the returned CompletedProcess instance.
  • Has an additional copy_local_env argument, which will copy your current shell environment to the subprocess if True.
  • Timeout is accepted, buy will raise a NotImplimentedError if set on Python 2.x.
  • It doesn’t take positional Popen arguments, only keyword args (2.6 limitation).
  • It returns the same output as Popen, so on Python 2.x stdout and stderr are strings, and on 3.x they are bytes.

Here you can see an example of copy_local_env  in action running on Python 2.6.

import os

os.environ['MYVAR'] = 'Butterfly'

reusables.run("echo $MYVAR", copy_local_env=True, shell=True)

# CompletedProcess(args='echo $MYVAR', returncode=0, 
#                 stdout='Butterfly\n')

File Hashing

Python already has nice hashing capabilities through hashlib, but it’s a pain to rewrite the custom code for being able to handle large files without a large memory impact.  Consisting of opening a file and iterating over it in chunks and updating the hash. Instead, here is a convenient function.

reusables.file_hash("reusables\\reusables.py", hash_type="sha")

# '50c5425f9780d5adb60a137528b916011ed09b06'

By default it returns an md5 hash, but can be set to anything available on that system, and returns it in the hexdigest format, if the kwargs hex_digest is set to false, it will be returned as bytes.

reusables.file_hash("reusables\\reusables.py", hex_digest=False)

# b'4\xe6\x03zPs\xf5\xe9\x8dX\x9c/=/<\x94'

Starting with python 2.7.9, you can quickly view the available hashes directly from hashlib via hashlib.algorithms_available.

# CPython 3.6
import hashlib

print(f"{hashlib.algorithms_available}")
# {'sha3_256', 'MD4', 'sha512', 'sha3_512', 'DSA-SHA', 'md4', ...

reusables.file_hash("wallpapers.zip", "sha3_256")

# 'b7c357d582f8932977d785a24f728b267cef1de87537076aadac5049f4e4fa70'

Duplicate Files

You know you’ve seen this picture  before, you shouldn’t have to safe it again, where did that sucker go? Wonder no more, find it!

list(reusables.dup_finder("F:\\Pictures\\20131005_212718.jpg", 
                          directory="F:\\Pictures"))

# ['F:\\Pictures\\20131005_212718.jpg',
#  'F:\\Pictures\\Me\\20131005_212718.jpg',
#  'F:\\Pictures\\Personal Favorite\\20131005_212718.jpg']

dup_finder is a generator that will search for a given file at a directory, and all sub-directories. This is a very fast function, as it does a three step escalation to detect duplicates, if a step does not match, it will not continue with the other checks, they are verified in this order:

  1. File size
  2. First twenty bytes
  3. Full SHA256 compare

That is excellent for finding a single file, but how about all duplicates in a directory? The traditional option is to create a dictionary of hashes of all the files to compares against. It works, but is slow. Reusables has directory_duplicates function, which first does a file size comparison first, and only moves onto hash comparisons if the size matches.

reusables.directory_duplicates(".")

# [['.\\.git\\refs\\heads\\master', '.\\.git\\refs\\tags\\0.5.2'], 
#  ['.\\test\\empty', '.\\test\\fake_dir']]

It returns a list of lists, each internal list is a group of matching files.  (To be clear “empty” and “fake_dir” are both empty files used for testing.)

Just how much faster is it this way? Here’s a benchmark on my system of searching through over sixty-six thousand (66,000)  files in thirty thousand (30,000) directories.

The comparison code (the Reusables duplicate finder is refereed to as ‘size map’)

import reusables

@reusables.time_it(message="hash map took {seconds:.2f} seconds")
def hash_map(directory):
    hashes = {}
    for file in reusables.find_files(directory):
        file_hash = reusables.file_hash(file)
        hashes.setdefault(file_hash, []).append(file)

    return [v for v in hashes.values() if len(v) > 1]


@reusables.time_it(message="size map took {seconds:.2f} seconds")
def size_map(directory):
    return reusables.directory_duplicates(directory)


if __name__ == '__main__':
    directory = "F:\\Pictures"

    size_map_run = size_map(directory)
    print(f"size map returned {len(size_map_run)} duplicates")

    hash_map_run = hash_map(directory)
    print(f"hash map returned {len(hash_map_run)} duplicates")

The speed up of checking size first in our scenario is significant, over 16 times faster.

size map took 40.23 seconds
size map returned 3511 duplicates

hash map took 642.68 seconds
hash map returned 3511 duplicates

It jumps from under a minute for using reusables.directory_duplicates to over ten minutes when using a traditional hash map. This is the fastest pure Python method I have found, if you find faster, let me know!

Safe File Names

There are plenty of instances that you want to save a meaningful filename supplied by a user, say for a file transfer program or web upload service, but what if they are trying to crash your system?

Reusables has three functions to help you out.

  • check_filename: returns true if safe to use, else false
  • safe_filename: returns a pruned filename
  • safe_path: returns a safe path

These are designed not off of all legally allowed characters per system, but a restricted set of letters, numbers, spaces, hyphens, underscores and periods.

reusables.check_filename("safeFile?.text")
# False

reusables.safe_filename("safeFile?.txt")
# 'safeFile_.txt'

reusables.safe_path("C:\\test'\\%my_file%\\;'1 OR 1\\filename.txt")
# 'C:\\test_\\_my_file_\\__1 OR 1\\filename.txt'

Touch

Designed to be same as Linux touch command. It will create the file if it does not exist, and updates the access and modified times to now.

time.time()
# 1484450442.2250443

reusables.touch("new_file")

os.path.getmtime("new_file")
# 1484450443.804158

Simple JSON and CSV save and restore

These are already super simple to implement in pure python with the standard library, and are just here for convince of not having to remember conventions.

List of lists to CSV file and back

my_list = [["Name", "Location"],
           ["Chris", "South Pole"],
           ["Harry", "Depth of Winter"],
           ["Bob", "Skull"]]

reusables.list_to_csv(my_list, "example.csv")

# example.csv
#
# "Name","Location"
# "Chris","South Pole"
# "Harry","Depth of Winter"
# "Bob","Skull"


reusables.csv_to_list("example.csv")

# [['Name', 'Location'], ['Chris', 'South Pole'], ['Harry', 'Depth of Winter'], ['Bob', 'Skull']]

Save JSON with default indent of 4

my_dict = {"key_1": "val_1",
           "key_for_dict": {"sub_dict_key": 8}}

reusables.save_json(my_dict,"example.json")

# example.json
# 
# {
#     "key_1": "val_1",
#     "key_for_dict": {
#         "sub_dict_key": 8
#     }
# }

reusables.load_json("example.json")

# {'key_1': 'val_1', 'key_for_dict': {'sub_dict_key': 8}}

Cut a string into equal lengths

Ok, I admit, this one has absolutely nothing to do with the file system, but it’s just to handy to not mention right now (and doesn’t really fit anywhere else). One of the features I was most surprised to not be included in the standard library was to a have a function that could cut strings into even sections.

I haven’t seen any PEPs about it either way, but I wouldn’t be surprised if one of the reasons is ‘why do to with leftover characters?’. Instead of forcing you to stick with one, Reusables has four different ways it can behave for your requirement.

By default, it will simply cut everything into even segments, and not worry if the last one has matching length.

reusables.cut("abcdefghi")
# ['ab', 'cd', 'ef', 'gh', 'i']

The other options are to remove it entirely, combine it into the previous grouping (still uneven but now last item is longer than rest instead of shorter) or raise an IndexError exception.

reusables.cut("abcdefghi", 2, "remove")
# ['ab', 'cd', 'ef', 'gh']

reusables.cut("abcdefghi", 2, "combine")
# ['ab', 'cd', 'ef', 'ghi']

reusables.cut("abcdefghi", 2, "error")
# Traceback (most recent call last):
#     ...
# IndexError: String of length 9 not divisible by 2 to splice

Config to Dictionary

Everybody and their co-worker has written a ‘better’ config file handler of some sort, this isn’t trying to add to that pile, I swear. This is simply a very quick converter using the built in parser directly to dictionary format, or to a python object  I call a Namespace (more on that in future post.)

Just to make clear, this only reads configs, not writes any changes. So given an example config.ini file:

[General]
example=A regular string

[Section 2]
my_bool=yes
anint=234
exampleList=234,123,234,543
floatly=4.4

It reads it as is into a dictionary. Notice there is no automatic parsing or anything fancy going on at all.

reusables.config_dict("config.ini")
# {'General': {'example': 'A regular string'},
#  'Section 2': {'anint': '234',
#                'examplelist': '234,123,234,543',
#                'floatly': '4.4',
#                'my_bool': 'yes'}}

You can also take it into a ConfigNamespace.

config = reusables.config_namespace("config.ini")
# <ConfigNamespace: {'General': {'example': 'A regular string'}, 'Section 2': ...

Namespaces are special dictionaries that allow for dot notation, similar to Bunch but recursively convert dictionaries into Namespaces.

config.General
# <ConfigNamespace: {'example': 'A regular string'}>

ConfigNamespace has handy built-in type specific retrieval.  Notice that dot notation will not work if item have spaces in them, but the regular dictionary key notation works as well.

config['Section 2'].bool("my_bool")
# True

config['Section 2'].bool("bool_that_doesn't_exist", default=False)
# False
# If no default specified, will raise AttributeError

config['Section 2'].float('floatly')
# 4.4

It supports booleans, floats, ints, and unlike the default config parser, lists. Which even accepts a modifier function.

config['Section 2'].list('examplelist', mod=int)
# [234, 123, 234, 543]

Finale

That’s all for this first overview,. hope you found something useful and will make your life easier!

Related links:

Indisputably immutable

For many of us, as we develop as coders, we want to continue to grow our knowledge. We pour over the standard library (STL) of this wonderful language, looking for hidden gems. There are many such gems in the Python STL. Such as all the fun things you can do with sets and itertools. But one of the lesser used (which is a real shame) is found in the collections module: namedtuples.

So….what is a named tuple?

from collections import namedtuple

Address = namedtuple("Address", ["number", "street", "city", "state", "zip_code"])

the_prez = Address("1600", "Pennsylvania Avenue", "Washington", "DC", "20500")

print("{the_prez.number} {the_prez.street}".format(**locals()))

 

Pretty boring stuff all around, right? Who cares that you can get dotted notation from this? Why not just write your own class?  You totally can.  There’s nothing to it.

class Address(object):
    def __init__(self, number, street, city, state, zip_code):
        self.number = number
        self.street = street
        self.city = city
        self.state = state
        self.zip_code = zip_code

Yep. Gotta love the classic Python class. Super explicit, which is keeping in one of the core tenants of the Python. But, then, depending on your use-case they may have a weakness.  Let’s say that the type of class you have written is all about data storage and retrieval.  That means you’d really like the data to be reliable and, in a word: immutable.  If you just write the aforementioned class and someone wants to change the street number where the president lives it’s pretty simple stuff:

class Address(object):
    def __init__(self, number, street, city, state, zip_code):
        self.number = number
        self.street = street
        self.city = city
        self.state = state
        self.zip_code = zip_code

the_prez = Address("1600", "Pennsylvania Avenue", "Washington", "DC", "20500")
the_prez.number = "1601"

Boom, now the postman will now deliver your postcard from the grand canyon to the wrong house.

“But,” you say, “I could just override __setitem__ and be done with my class!” You are correct. But, maybe, just maybe you’re a lazy coder (as I believe all coders should be from time to time), and you want to have immutibility without having to rewrite a core behavior of your class?  Why, what you’d be talking about would have the immutability of a tuple, with the flexibility of a class! Sounds crazy?  Sounds impossible?  Well, guess what?  You can have it.

Named tuples, just as the name implies is a tuple (an immutable data type) which uses names instead of numerical indexing. Simple. But, why the devil would you use them if you already control the code? Is it simply the immutability?

class Address(namedtuple("Address", ["number", "street", "city", "state", "zip_code"])):

    def __new__(cls, number=None, street=None, city=None, state=None, zip_code=None):
        return super(Address, cls).__new__(
            cls,
            number=number,
            street=street,
            city=city,
            state=state,
            zip_code=zip_code)

So, what happens if you try to redirect that grand canyon postcard by changing the house number now?

Traceback (most recent call last):
 File "namedtuplecraziness.py", line 74, in
 the_prez.number = "1601"
 AttributeError: can't set attribute

Nope.  Sorry.  Protected by immutability!

Of course, there are always oddities.  If you were to have a mutable variable in the namedtuple, the variable retains its mutability, while the other (non-mutable) variables remain immutable.  Here’s an example where we make the ‘number’ field into a list.  Which means we can now change it through direct index-based assignment or even through append.

Here’s an example where we make the ‘number’ field into a list.  Which means we can now change it through direct index-based assignment or even through append.

from collections import namedtuple

class Address(namedtuple("Address", ["number", "street", "city", "state", "zip_code"])):
    def __new__(cls, number=None, street=None, city=None, state=None, zip_code=None):
        return super(Address, cls).__new__(
            cls,
            number=[number],
            street=street,
            city=city,
            state=state,
            zip_code=zip_code)

a = Address()
print(a)
Address(number=[None], street=None, city=None, state=None, zip_code=None)
a.number[0] = 1600

print(a)
Address(number=[1600], street=None, city=None, state=None, zip_code=None)

a.number.append(1700)

print(a)
Address(number=[1600, 1700], street=None, city=None, state=None, zip_code=None)

Of course, just because it’s a list, and therefore mutable, doesn’t mean you can re-assign it.  That’s when the immutability of the namedtuple asserts itself.

a.number = 1600
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
can't set attribute

 

One of the first things you should know about Python is, EVERYTHING IS AN OBJECT. That means that you can inherit from it. Boom. I just inherited from my named tuple, and now I can add items to the class to define how data is retrieved! Super nifty!

Wow.  Who cares, you say.  Big deal?  What are you, some type of wierdo?   Yes.  Yes, I am.

So, let’s say, that you have a database of addresses. And you want to get a printable mailing address for an entry? Well, you could easily apply the queried records from the database into our Address model (we’ll just use this name as a reference). Now, we have a whole list of these entries. Perhaps we’re having a party, and we want to mail invites to everyone.  This is where the whole nametuple inheritance gives you some pretty cool flexibility.

class Address(namedtuple("Address", ["number", "street", "city", "state", "zip_code"])):

    def __new__(cls, number=None, street=None, city=None, state=None, zip_code=None):
        return super(Address, cls).__new__(
            cls,
            number=number,
            street=street,
            city=city,
            state=state,
            zip_code=zip_code)
   
    def mailing_address(self):
      return ("{self.number} {self.street}\n"
              "{self.city}, {self.state} {self.zip_code}"
             ).format(**locals())

Nice and simple. Or perhaps you’re going to export your entire rolodex to csv (for some reason)?

class Address(namedtuple("Address", ["number", "street", "city", "state", "zip_code"])):

    def __new__(cls, number=None, street=None, city=None, state=None, zip_code=None):
        return super(Address, cls).__new__(
            cls,
            number=number,
            street=street,
            city=city,
            state=state,
            zip_code=zip_code)
   
    def mailing_address(self):
      return ("{self.number} {self.street}\n"
              "{self.city}, {self.state} {self.zip_code}"
             ).format(**locals())

    def to_csv(self):
        return ",".join(self._asdict().values()

 

Now you could just iterate over that list of Address entries and write out the to_csv() call!  It would likely look something like:

addresses = [all your address entries would be here]
with open("rolodex.csv", 'wb') as f:
    [f.write(a.to_csv() for a in addresses]

I’m not advocating for a replacement of traditional classes.

My opinion: Python classes should fall into one of three basic types:

  1. A class could contain data (such as our namedtuple example here)
  2. a class could provide encapsulation of action (like a database library).
  3. Or a class could provide actuation/transformation of data (kind of a mix of #1 and #2)

If you’re dealing with data which is merely going to be acted upon and never changed, then take a swing with these fancy namedtuples.  If you’re doing lots of actuation, write a normal class.   At all times, never should a good coder assume they know the best way to do something.  You’d be surprised how often when I make this assumption, some young, talented coder comes along and corrects my ego.