Stop using plus signs to concatenate strings!

In Python, using plus signs to concatenate strings together is one of the first things you learn, i.e. print("hello" + "world!"), and it should be one of the first things you stop using. Using plus signs to “add” strings together is inherently more error prone, messier and unprofessional. Instead you should be using .format() or f-strings.

Hunter – Artwork by Clara Griffith

Before diving into what’s really wrong with + plus sign concatenation, we are going to take a quick step back and look at the possible different ways to merge strings together in Python, so we can get a better understanding of where to use what.

Concatenating strings

When to useWhen to avoid
%Legacy code, logging modulePython 3+
f-stringPython 3.6+When you need to escape characters inside the {}s
joinOn an iterable (list, tuple, etc) of strings

Here is a quick demo of each of those methods in action using the same tuple of strings. For an already existing iterate of strings, join makes the most sense if you want them to have the same character(s) between all of them. However, in most other cases join won’t be applicable so we are going to ignore it for the rest of this post.

variables = ("these", "are", "strings")

print(" ".join(variables))
print("%s %s %s" % variables)
print("{} {} {}".format(*variables))
print(f"{variables[0]} {variables[1]} {variables[2]}")
print(variables[0] + " " + variables[1] + " " + variables[2])

# They all print "these are strings"

In many cases you will have other words or strings not in the same structure you will be concatenating together, so even though something like f-strings here looks more cumbersome than the others, it wins out in simplicity in other scenarios. I honestly use f-strings more than anything else, but .format does have advantages we will look at later. Anyways, back to why using plus signs with strings is bad.

Errors lurking in the shadows

Consider the following code, which has four different perfectly working examples of string concatenation.

wait_time = "0.1"
time_amount = "seconds"

print("We are going to wait {} {}".format(wait_time, time_amount))

print(f"We are going to wait {wait_time} {time_amount}")

print("We are going to wait %s %s" % (wait_time, time_amount))

print("We are going to wait " + wait_time + " " + time_amount)

# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds

Everything works as expected, but wait, if we are going to put a time.sleep in there, it takes the wait time as a float. Let’s update that and add the sleep.

Concatenation TypeErrors

import time

wait_time = 0.1 # Changed from string to float
time_amount = "seconds"

print("We are going to wait {} {}".format(wait_time, time_amount))

print(f"We are going to wait {wait_time} {time_amount}")

print("We are going to wait %s %s" % (wait_time, time_amount))

print("We are going to wait " + wait_time + " " + time_amount)


print("All done!")

# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds
# Traceback (most recent call last):
#    print("We are going to wait " + wait_time + " " + time_amount)
# TypeError: can only concatenate str (not "float") to str

That’s right, the only method of string concatenation to break our code was using + plus signs. Now here it was very obvious it was going to happen. But what about going back to your code a few weeks or months later? Or even worse, if you are using someone else’s code as a library and they do this. It can become quite an avoidable headache.

Formatting issues

Another common issue that you will run into frequently using plus signs is unclear formatting. It’s very easy to forget to add white space around variables when you aren’t using a single string with replace characters like every other method. What can look very similar will yield two different results:

print(f"{wait_time} {time_amount}")
print(wait_time + time_amount)

# 0.1 seconds
# 0.1seconds

Did you even notice we had that issue in the very first paragraph’s code? print("hello" + "world!")


This is the most subjective of my reasons to avoid it, but I personally think it becomes very unreadable compared to any other methods, as shown with the following example.

mixed_type_vars = {
    "a": "My",
    "b": 2056,
    "c": "bodyguards",
    "d": {"have": "feelings"}

def plus_string(variables):
    return variables["a"] + " " + str(variables["b"]) + \
           " " + variables["c"] + " " + str(variables["d"])

def format_string(variables):
    return "{a} {b} {c} {d}".format(**variables)

def percent_string(variables):
    return "%s %d %s %s" % (variables["a"], variables["b"], 
                            variables["c"], variables["d"])


String format is very powerful because it is a function, and can take positional or keyword args and replace them as such in the string. In the example above .format(**variables) is equivalent to

.format(a="My", b=2056, c="bodyguards", d={"have": "feelings"})

That way in the string you can reference them by their keywords (in this case single characters a through d).

"Thing string is {opinion} formatted".format(opinion="very nicely")

Which means with format you have a lot of options to make the string a lot more readable, or you can reuse positional or named variables easily.

print("{0} is not {1} but it is {0} just like "
      "{fruit} is not a {vegetable} but is a {fruit}"
      "".format(1, 2, fruit="apple", vegetable="potato"))

Slower string conversion

Using the functions from the Messy section we can see that it is also slower when concatenation a mix of types.

import timeit
plus = timeit.timeit('plus_string(mixed_type_vars)',
                     setup='from __main__ import mixed_type_vars, plus_string')

form = timeit.timeit('format_string(mixed_type_vars)',
                     setup='from __main__ import mixed_type_vars, format_string')

percent = timeit.timeit('percent_string(mixed_type_vars)',
                     setup='from __main__ import mixed_type_vars, percent_string')

print("Concatenating a mix of types into a string one million times:")
print(f"{plus:.04f} seconds - plus signs")
print(f"{form:.04f} seconds - string format")
print(f"{percent:.04f} seconds - percent signs")

# Concatenating a mix of types into a string one million times:
# 1.9958 seconds - plus signs
# 1.3123 seconds - string format
# 1.0439 seconds - percent signs

On my machine, percent signs were slightly faster than string format, but both smoked using plus signs and explicit conversion.


This isn’t only something to call out teammates on during code review, but can even negatively impact you if you’re applying for Python jobs. Using “+” everywhere for strings is a red flag that you are still a novice. I don’t know anyone personally that has been turned away because of something so trivial, but it does show that you unfamiliar with Python’s awesome feature rich strings and haven’t had a lot of experience in group coding.

If you ever saw Batman or James Bond coding in Python, they wouldn’t be using +s in their string concatenation, and nor should you!


"If" + "👏" + "you" + "👏" +"use" + "👏" + "plus signs" + "👏" + "to" + 
"👏" + "concatenate"  + "👏" + "your"  + "👏" + "strings"  + "👏" + "you" 
 + "👏" + "are"  + "👏" + "more"  + "👏" + "annoying"  + "👏" + "than"  + 
"👏" + "this"  + "👏" + "meme!"

Truffle: going from ganache to testnet (ropsten)

Truffle is an amazing suite of tools created by Consensys to develop smart contracts for the Ethereum blockchain network. However, it can be a bit jarring to make the leap from local development to the real test network, ropsten.

Required Setup

For this walk through, I have installed:

I will be using the default example truffle project, MetaCoin, that you can walk through how to unbox here or follow along using your own project.

First things first, if you do NOT have a package.json file yet, make sure to run npm init. This will turn the directory into a node package that we can easily manage with the npm package manager. So we don’t have to download dependices into the global package scope.

Now we can download all the things we are going to need:

npm install bip39 dotenv --save
  • bip39 – used to generate wallet mnemonic
  • dotenv – simple way to read environment variable files

We got everything development wise we need now.

Storing Secrets outside the code

We will have to create a private key or mnemonic, and that means we need somewhere relatively secure to store it. For testnet stuff, this can be as simple as making sure it’s not being put into version control alongside the code. To that end, we are going to use Environment Variables, and will to store them in a file called .env (that’s it, just an extension basically. Make sure to add it to your .gitignore if you’re using git). To learn more, check out the github page for dotenv. But for our purposes, all you need to know is that this file will have a format of:

ANOTHER_ENV=something else

Accessing testnet

The easiest way to reach out to testnet is by using a provider. I personally like using (free, just requires registration).  After you register and have your API key emailed to you, make sure you select the URL for the test network and add to the .env file using a variable named ROPSTEN_URL.


It’s also possible to use your own geth node set to testnet, but that is not required.

Next we are going to create our own wallet, if you already have one set up, like with MetaMask, you can skip this next part.

Creating your testnet wallet

So now you have an place to put your secrets, lets create some. This is where bip39 comes in, it will create random mnemonics which can be used as the basis for private key of a wallet. It will be a series of 12 random words.

We could put this generation in a file, but it’s easy enough to just do straight from the command line:

node -e "console.log(require('bip39').generateMnemonic())"

This will output 12 words, DO NOT SHARE THESE ANYWHERE. The ones I am using below are example ones, and also shout NOT be used. Put them in .env file as the variable MNEMONIC. So now your .env file should now contain:

MNEMONIC=candy maple cake sugar pudding cream honey rich smooth crumble sweet treat

We have our seed, so it’s time to hook it into our code. In your truffle.js or truffle-config.js file, you will need to now import the environment variables and a wallet provider at the top of the file.

const HDWalletProvider = require('truffle-hdwallet-provider')

After that is added, we will move down to the the exports section, we are going to add a new network, named ropsten. Then are going to use the HDWalletProvider and supply it with the mnemonic and Ifura url provided via environment variables.

module.exports = {
  networks: {
    ropsten: {
      provider: () => new HDWalletProvider(
      network_id: 3

Test and make sure everything’s working by opening a truffle console, specifying our new network.

truffle console --network ropsten

We can then get our public account address via the console.

truffle(ropsten)> web3.eth.getAccounts((err, accounts) => console.log(accounts))
[ '0x627306090abab3a6e1400e9345bc60c78a8bef57' ]

If you are seeing this same wallet address, you did it wrong. Go back and make your own mnemonic, don’t copy the candy one from above.

Funding the wallet

In your development environment, the wallet already has ETH in it to pay for gas and deploying the contract. On the mainnet, you will have to buy some real ETH. On testnet, you can get some for free by using a Faucet, such as or if you’re using MetaMask just use

Make sure to use the address you gathered from the console for the faucet,  and soon you should have test funds to play around with and actually deploy your contract.

Deploying the Contract

Now where the rubber meets the road, getting your contract out into the real (test) world.

truffle deploy --network ropsten

If everything is successful, you’ll get messages like these:

Using network 'ropsten'.

Running migration: 1_initial_migration.js
  Deploying Migrations...
  ... 0xefe70115c578c92bfa97154f70f9c3fbaa2b8400b1da1ee7cdxxxxxxxxxxxxxx
  Migrations: 0x6eeedefb64bd6ee6618ac54623xxxxxxxxxxxxxx
Saving successful migration to network...
  ... 0xd4294e35c166e2dca771ba9bf5eb3801bc1793e30db6a53d4dxxxxxxxxxxxxxx
Saving artifacts...
Running migration: 2_deploy_contracts.js
  Deploying Capture...
  ... 0x446d5e92d6976bb05c85bb95b243d6f7405af6bb12b3b6fe08xxxxxxxxxxxxxx
  Capture: 0x1d2f60c6ef979ca86f53af1942xxxxxxxxxxxxxx
Saving successful migration to network...
  ... 0x0b6f918ccc8e3b82cdf43038a2c32fe1fef66d0fa9aeb2260bxxxxxxxxxxxxxx
Saving artifacts...

Tada! You now have your custom contracts deployed to testnet!

Or, you got an out of gas error, as it is not uncommon to have to adjust the gas price to get it onto the network, as truffle does not automatically figure that out for you. A follow up post will show how to calculate and adjust gas price as needed.




Discover AWS State Machines using Python Lambdas for an ETL process

Step Functions, State Machines, and Lambdas oh my! AWS has really been expanding what you can do without needing to actually stand up any servers. I’m going to walk through a very basic example of how to get going with your own Python code to create an ETL (Extract Transform Load) process using Amazon’s services. And don’t worry, all this goodness is included in the free tier!

The goal of this exercise will be to have an aggregation of news headlines downloaded and transformed into CSV format and uploaded to another service. We are going to achieve this by breaking up each step of the process into its own AWS Lambda.

What are Lambdas?

AWS Lambdas are a “serverless”, stateless way to run snippets of code with no extra initialization or shutdown time.

When to use Lambdas

They are great if you have small highly reusable pieces of code that serve a single purpose. (If you have a few that go together really well, that’s where state machines come in.)  For example if you have some code that does image recognition and you need to use it across multiple projects. Or even just want it to run faster or be more accessible, as Lambdas have several ways they can be initiated, including via an API you can define.

They will NOT fit your purpose if you need something that does a multitude of tasks, will run for a long time, use a lot of memory or update frequently.

Creating a Lambda

Creating your own is a lot easier than a lot of other tutorials seem to show. If you haven’t already, sign up for an AWS account. Then open your AWS console and search for Lambda.

You’ll be presented with a welcome screen most likely, after clicking through “Get Started” or whatever they updated it to this month, you’ll have a screen where you can create new functions as well as check on existing ones.

See the big orange button that even Trump would be proud of? Click it.

As this is probably your first Lambda, you will have to create a new role. Super simple, don’t have to leave the page even. Just give it a new name, and give it a policy template. I used the Simple Microserve permissions as it seemed to fit the bill for me most.

Then you will be greeted with a page with a large amount of info and stuff going on. The part that we are going to be most concerned about is the Function Code area (and will also need Environment Variables to store API keys in).

It may seem like we need to set up triggers or resources for this information to go to, but as we plan to use these inside a state machine, that will handle all that bother for us.

ETL – Extract Transform Load

Now that we know how to make a Lambda, lets look at some code we could use with it. For the state machine we will create later, I want to have an entire process where I pull in information from an outside source (extract), modify to fit my needs (transform) and then put it into my own system (load.)


As stated above, this scenario involves pulling down data from a news source, in this case we are using News API that allows you to create a free API key to grab top news headlines and link to their stories.

That code is dead simple:

import json
from urllib import request

def retrieve_news(key, source):
    url = f"{source}&apiKey={key}"
    with request.urlopen(url) as req:
        return json.loads(

print(retrieve_news(my_key, 'associated-press')

If I wasn’t using this in a Lambda, I would be using the wonderful Requests module instead, but Python 3’s urllib is at least a lot better than 2s.

So now, we need a way for the Lambda function to call this code and pass along the results in a manor we can use later. On the page to fill in the code, you’ll see a place that says under Function Code that lists the Handler this is the entry point to your code. lambda_function.lambda_handler is the default, which means it will use the function lambda_handler inside the file as the entry.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import os
import json
from urllib import request

def retrieve_news(key, source):
    url = f"{source}&apiKey={key}"
    with request.urlopen(url) as req:
        return json.load(req)

# What AWS Lambda calls
def lambda_handler(event, context):
    key = os.getenv('NEWSAPI_KEY')
    if not key:
        raise Exception('Not all environment variables set')

    if not event['source']:
        raise Exception('Source not set')

    return {'data': retrieve_news(key, event['source']),
            'source': event['source']}

There are two arguments passed into the function, the first is event which is all the information sent to the lambda function (if using a standard JSON object this will be a dictionary, as seen above). The second is context which is a class that will tell you about the current lambda function if necessary, you can learn more about it here, but it will not be used in this example.

Testing the lambda

You may also notice that we are pulling the API key not from the event, but from an environment variable, so make sure to set that as well on the page. Last and not least, I would suggest increasing the timeout for the lambda to 10 seconds, from the default 3.

Before we go on and add the other functions, lets make sure this one works properly.  At the top of the page, where there is a drop down beside test and Actions on the right, click Configure test events we are going to add a new one with the details that will be passed into the event dictionary.

  "source": "associated-press"

On the pop-up, copy in the above JSON and save it as a new test event.

Hit the test button at the top, and see the results. You should get a big green window that shows you how it ran. If you have a red error window, you will have to figure out what went wrong first.


This will be our second lambda, so we get to go through the process again of creating a new one (you can use the exiting role from the last one) and copying this code into it. No Environment variables needed this time!

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import csv
from io import StringIO

# What AWS Lambda calls
def lambda_handler(event, context):

    sio = StringIO()
    writer = csv.writer(sio)
    writer.writerow(["Source", "Title", "Author", "URL"])
    for article in event['data']['articles']:

    csv_content = sio.getvalue()

    return {'data': csv_content,
            'source': event['source']}

The tricky part here is now you need good test data for it. Luckily you can copy the output of the last Lambda (provided snippet below) to do just that.

  "data": {
    "status": "ok",
    "totalResults": 5,
    "articles": [
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        "author": "FRANCES D'EMILIO",
        "title": "Pope accepts resignation of McCarrick after sex abuse claims",
        "description": "VATICAN CITY (AP) — In a move described as unprecedented, Pope Francis has effectively stripped U.S. prelate Theodore McCarrick of his cardinal's title and rank following allegations of sexual abuse, including one involving an 11-year-old boy. The Vatican ann…",
        "url": "",
        "urlToImage": "",
        "publishedAt": "2018-07-28T16:21:57Z"
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        "author": "KEVIN FREKING",
        "title": "On trade policy, Trump is turning GOP orthodoxy on its head",
        "description": "WASHINGTON (AP) — President Donald Trump's trade policies are turning long-established Republican orthodoxy on its head, marked by tariff fights and now $12 billion in farm aid that represents the type of government intervention GOP voters railed against a de…",
        "url": "",
        "urlToImage": "",
        "publishedAt": "2018-07-28T16:20:11Z"
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        "author": "SETH BORENSTEIN and FRANK JORDANS",
        "title": "Science Says: Record heat, fires worsened by climate change",
        "description": "Heat waves are setting all-time temperature records across the globe, again. Europe suffered its deadliest wildfire in more than a century, and one of nearly 90 large fires in the U.S. West burned dozens of homes and forced the evacuation of at least 37,000 p…",
        "url": "",
        "urlToImage": "",
        "publishedAt": "2018-07-28T15:03:01Z"
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        "title": "No mystery to Supreme Court nominee Kavanaugh's gun views",
        "description": "SILVER SPRING, Md. (AP) — Supreme Court nominee Brett Kavanaugh says he recognizes that gun, drug and gang violence \"has plagued all of us.\" Still, he believes the Constitution limits how far government can go to restrict gun use to prevent crime. As a federa…",
        "url": "",
        "urlToImage": "",
        "publishedAt": "2018-07-28T14:11:06Z"
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        "title": "AP FACT CHECK: Trump's hyped claims on economy, NKorea, vets",
        "description": "WASHINGTON (AP) — President Donald Trump received positive economic news this past week and twisted it out of proportion. That impulse ran through days of rhetoric as he hailed the success of a veterans program that hasn't started and saw progress with North …",
        "url": "",
        "urlToImage": "",
        "publishedAt": "2018-07-28T12:30:33Z"
  "source": "associated-press"

Configure and run the test like before using the above data.

In this case I also printed the output so you could see that any standard output is captured by the logs.


Now to actually submit this data to a server, you could set up your own, or use which is a free filedropper website, as the code uses below. No API needed!

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

from urllib import request, parse
import json

# What AWS Lambda calls
def lambda_handler(event, context):
    url = ''

    encoded_args = parse.urlencode({'text': event['data']}).encode('utf-8')

    with request.urlopen(url, encoded_args) as req:
        info = json.load(req)

    return {'data': info, 'source': event['source']}

Again, as this is reaching out to an external API, I would increase the default 3 second timeout limit of the Lambda from 3 to 10 seconds.

Woo! We now have three lambda’s that can take each other’s outputs in a row and do a full ETL process. Now lets put them together.

State Machines

AWS Step functions allow for creating a set of various actions to run with each other, and then presented in a pretty auto-generated graph. Back at the console, find the Step functions.

Then create a new state machine.

This is probably the hardest part, is the actual state machine definition. The state language can be confusing, thankfully for our needs we don’t need to do anything complicated.

You can use this code, and will just have to update the actual Resource links under Extract, Transform and Load. (You can even click on them and should be presented with a drop down of your previously created resources so you don’t have to copy the ARNs manually.)

  "StartAt": "Set Source",
  "States": {
    "Set Source": {
      "Type": "Pass",
      "Result": {"source": "associated-press"},
      "ResultPath": "$",
      "Next": "Extract"
    "Extract": {
      "Type": "Task",
      "Resource": "<ARN>:function:google-news-extract",
      "ResultPath": "$",
      "Next": "Transform"
    "Transform": {
      "Type": "Task",
      "Resource": "<ARN>:function:google-news-transform",
      "ResultPath": "$",
      "Next": "Load"
     "Load": {
      "Type": "Task",
      "Resource": "<ARN>:function:google-news-load",
      "ResultPath": "$",
      "End": true

Notice the first step is not a task, but rather a pass through state that sets the source. We could do this during initialization, but wanted to highlight the ability to add information where needed.

After creation, we will need to start a new execution. It doesn’t need any input, but doesn’t hurt to include a comment if you want.

Then run it!


During the middle of an execution, it will show what has been run successfully and what is currently in progress, or erred. At any time, you can click on a specific block to see what it’s input and outputs were.

This function then can be run whenever to run the full ETL process!


For a process like this, you want to run it on a schedule. That means creating a new CloudWatch rule. Search for CloudWatch in the console, then click on Rules on the left hand side.

Then, click the big blue button.

It’s pretty simple to create a fixed rate schedule, and then just make sure to select the right state machine on the right side!


Uploading large files by chunking – featuring Python Flask and Dropzone.js

It can be a real pain to upload huge files. Many services limit their upload sizes to a few megabytes, and you don’t want a single connection open forever either. The super simple way to get around that is simply send the file in lots of small parts, aka chunking.

UPDATE: Check out the new article, which includes adding parallel chucking for speed improvements.

Chunking Food - Artwork by Clara Griffith
Chunking Food – Artwork by Clara Griffith

Finished code example can be viewed at github.

So there are going to be two parts to making this work, the front-end (website) and backend (server). Lets start on what the user will see.

Webpage with Dropzone.js

Beautiful, ain’t it? The best part is, the code powering it is just as succinct.

<!doctype html>
<html lang="en">

    <meta charset="UTF-8">

    <link rel="stylesheet" 

    <link rel="stylesheet" 

    <script type="application/javascript" 

    <title>File Dropper</title>

<form method="POST" action='/upload' class="dropzone dz-clickable" 
      id="dropper" enctype="multipart/form-data">


This is using the dropzone.js library, which has no additional dependencies and decent CSS included. All you have to do is add the class “dropzone” to a form and it automatically turns it into one of their special drag and drop fields (you can also click and select).

However, by default, dropzone does not chunk files. Luckily, it is really easy to enable. We are going to add some custom JavaScript and insert it between the form and the end of the body


<script type="application/javascript">
    Dropzone.options.dropper = {
        paramName: 'file',
        chunking: true,
        forceChunking: true,
        url: '/upload',
        maxFilesize: 1025, 
        chunkSize: 1000000 


When enabling chunking, it will break up any files larger than the chunkSize and send them to the server over multiple requests. It accomplishes this by adding form data that has information about the chunk (uuid, current chunk, total chunks, chunk size, total size). By default, anything under that size will not have that information send as part of the form data and the server would have to have an additional logic path. Thankfully, there is the forceChunking option which will always send that information, even if it’s a smaller file. Everything else is pretty self-explanatory, but if you want more details about the possible options, just check out their list of configuration options.

Python Flask Server

Onto the backend. I am going to be using Flask, which is currently the most popular Python web framework (by github stars), other good options include Bottle and CherryPy. If you hate yourself or your colleagues, you could also use Django or Pyramid. There are a ton of good example Flask projects, and boiler plates to start from, I am going to use one that I have created for my own use that fits my needs, but don’t feel obligated to use it.

This type of upload will work across any real website back-end. You will simply need two routes, one that displays the frontend, and the other that accepts the file as an upload. At first, lets just view what dropzone is sending us. In this example my project’s name is called ‘pydrop’, and if you’re using my FlaskBootstrap code, this is the views/ file.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import logging
import os

from flask import render_template, Blueprint, request, make_response
from werkzeug.utils import secure_filename

from pydrop.config import config

blueprint = Blueprint('templated', __name__, template_folder='templates')

log = logging.getLogger('pydrop')

def index():
    # Route to serve the upload form
    return render_template('index.html',

@blueprint.route('/upload', methods=['POST'])
def upload():
    # Route to deal with the uploaded chunks
    return make_response(('ok', 200))

Run the flask server and upload a small file (under the size of the chunk limit). It should log a single instance of a POST to /upload:

[INFO] werkzeug: "POST /upload HTTP/1.1" 200 -

[INFO] pydrop: ImmutableMultiDict([
     ('dzuuid', '807f99b7-7f58-4d9b-ac05-2a20f5e53782'), 
     ('dzchunkindex', '0'), 
     ('dztotalfilesize', '1742'), 
     ('dzchunksize', '1000000'), 
     ('dztotalchunkcount', '1'), 
     ('dzchunkbyteoffset', '0')])

[INFO] pydrop: ImmutableMultiDict([
     ('file', &lt;FileStorage: '' ('application/octet-stream')&gt;)])

Lets break down what information we are getting:

dzuuid – Unique identifier of the file being uploaded

dzchunkindex – Which block number we are currently on

dztotalfilesize – The entire file’s size

dzchunksize – The max chunk size set on the frontend (note this may be larger than the actual chuck’s size)

dztotalchunkcount – The number of chunks to expect

dzchunkbyteoffset – The file offset we need to keep appending to the file being  uploaded

Next, let’s upload something just a bit larger that will require it to be chunked into multiple parts:

[INFO] werkzeug: "POST /upload HTTP/1.1" 200 -

[INFO] pydrop: ImmutableMultiDict([
    ('dzuuid', 'b4b2409a-99f0-4300-8602-8becbef24c91'), 
    ('dzchunkindex', '0'), 
    ('dztotalfilesize', '1191708'), 
    ('dzchunksize', '1000000'), 
    ('dztotalchunkcount', '2'), 
    ('dzchunkbyteoffset', '0')])

[INFO] pydrop: ImmutableMultiDict([
    ('file', &lt;FileStorage: '04vfpknzx8z01.png' ('application/octet-stream')&gt;)])

[INFO] werkzeug: "POST /upload HTTP/1.1" 200 -

[INFO] pydrop: ImmutableMultiDict([
    ('dzuuid', 'b4b2409a-99f0-4300-8602-8becbef24c91'), 
    ('dzchunkindex', '1'),
    ('dztotalfilesize', '1191708'),  
    ('dzchunksize', '1000000'), 
    ('dztotalchunkcount', '2'), 
    ('dzchunkbyteoffset', '1000000')])

[INFO] pydrop: ImmutableMultiDict([
    ('file', &lt;FileStorage: '04vfpknzx8z01.png' ('application/octet-stream')&gt;)])

Notice how /upload has been called twice. And that the dzchunkindex and dzchunkbyteoffset have been updated accordingly.  That means our upload function has to be smart enough to handle both new requests and existing multipart uploads.  That means for new requests we should open existing files and only write data after the data already in them, whereas we will create a file and start at the beginning for new uploads. Luckily, both can be accomplished by opening with the same code. First open file in append mode,  then ‘seek’ to the end of the current data (in this case we are relying on the seek offset to be provided by dropzone.)

@blueprint.route('/upload', methods=['POST'])
def upload():
    # Remember the paramName was set to 'file', we can use that here to grab it
    file = request.files['file']

    # secure_filename makes sure the filename isn't unsafe to save
    save_path = os.path.join(config.data_dir, secure_filename(file.filename))

    # We need to append to the file, and write as bytes
    with open(save_path, 'ab') as f:
        # Goto the offset, aka after the chunks we already wrote['dzchunkbyteoffset']))
    # Giving it a 200 means it knows everything is ok
    return make_response(('Uploaded Chunk', 200))

At this point you should have a working upload script, tada!

But lets beef this up a little bit. The following code improvements make it so we don’t overwrite existing files that have already been uploaded, checks the file size matches what we expect when we’re done, and gives a little more output along the way.

@blueprint.route('/upload', methods=['POST'])
def upload():
    file = request.files['file']

    save_path = os.path.join(config.data_dir, secure_filename(file.filename))
    current_chunk = int(request.form['dzchunkindex'])

    # If the file already exists it's ok if we are appending to it,
    # but not if it's new file that would overwrite the existing one
    if os.path.exists(save_path) and current_chunk == 0:
        # 400 and 500s will tell dropzone that an error occurred and show an error
        return make_response(('File already exists', 400))

        with open(save_path, 'ab') as f:
    except OSError:
        # log.exception will include the traceback so we can see what's wrong 
        log.exception('Could not write to file')
        return make_response(("Not sure why,"
                              " but we couldn't write the file to disk", 500))

    total_chunks = int(request.form['dztotalchunkcount'])

    if current_chunk + 1 == total_chunks:
        # This was the last chunk, the file should be complete and the size we expect
        if os.path.getsize(save_path) != int(request.form['dztotalfilesize']):
            log.error(f"File {file.filename} was completed, "
                      f"but has a size mismatch."
                      f"Was {os.path.getsize(save_path)} but we"
                      f" expected {request.form['dztotalfilesize']} ")
            return make_response(('Size mismatch', 500))
  'File {file.filename} has been uploaded successfully')
        log.debug(f'Chunk {current_chunk + 1} of {total_chunks} '
                  f'for file {file.filename} complete')

    return make_response(("Chunk upload successful", 200))

Now lets give this a try:

[DEBUG] pydrop: Chunk 1 of 6 for file DSC_0051-1.jpg complete
[DEBUG] pydrop: Chunk 2 of 6 for file DSC_0051-1.jpg complete
[DEBUG] pydrop: Chunk 3 of 6 for file DSC_0051-1.jpg complete
[DEBUG] pydrop: Chunk 4 of 6 for file DSC_0051-1.jpg complete
[DEBUG] pydrop: Chunk 5 of 6 for file DSC_0051-1.jpg complete
[INFO] pydrop: File DSC_0051-1.jpg has been uploaded successfully

Sweet! But wait, what if we remove the directories where the files are stored? Or try to upload the same file again?

(Dropzone’s text out of the box is a little hard to read, but it says “File already exists” on the left and “Not sure why, but we couldn’t write file the disk” on the right. Exactly what we’d expect.)

2018-05-28 14:29:19,311 [ERROR] pydrop: Could not write to file
Traceback (most recent call last):
FileNotFoundError: [Errno 2] No such file or directory:

We get error message on the webpage and in the logs, perfect.

I hope you found this information useful and if you have any suggestions on how to improve it, please let me know!

Thinking further down the road

In the long-term I would have a database or some permanent storage option to keep track of file uploads. That way you could see if one fails or stops halfway and be able to remove incomplete ones. I would also base saving files first into a temp directory based off their UUID then, when complete, moving them to a place based off their file hash. Would also be nice to have a page to see everything uploaded and manage directories or other options, or even password protected uploads.

Interprocess Communications

Inter-Office-Process Communications, Artwork by Clara Griffith

Inter-Office-Process Communications, Artwork by Clara Griffith

Dealing with communications across programs, processes or even threads can be a real pain in the patella. Each situation usually calls for something slightly different and has to work with a limited set of options. On top of that, a lot of tutorials I see are people simply starting one program from inside the other (for example with subprocess) but that just won’t do for my taste.

Just show me the table of which IPC method I need 

I’m going to go through options that can handle have two independent programs, started on their own, talk with each other. And when I say talk, I mean able to at least execute tasks in the other’s process and, in most cases, transfer data as well.


Probably one of the most old school ways of transferring data is to just pipe it back and forth as needed. This is how terminals and shells interact with programs, by using the standard pipes of stdin, stdout, and stderr (standard in, standard out, standard error).

Every time you print you are writing to stdout and it is very common to use this in Python when running another program from within it (i.e. if we did'') we could easily communicate with it via the pipes). These are anonymous pipes, that exist only while the program is running. To communicate between different programs you should use a Named Pipe which creates a file descriptor that you connect to as an entry point.

Here are some short examples showing their use on Linux. It is also possible on Windows with the pywin32 module or do it yourself with ctypes. But I find it easier to just use other methods on Windows.

Our example will be two programs, the first of which, is simply a text converter. The message sent will be in three parts, a starting identifier, X, four integers to denote the message length, and the message itself. (This is by no means a common standard or practical, just something I made up for a quick example.)

So to send ‘Howdy’, it would look like X0005HOWDY, X being the identifier of a new message, 0005 denoting the length of the message to come, and HOWDY being the message body.

import time 
import os 

# This is a full path on the file system, 
# in this scenario both are run from the same directory
pipename = 'fifo'

# For non-blocking pipes, you have to be reading from it 
# before you can have something else writing to it
pipein =, os.O_NONBLOCK|os.O_RDONLY) 
p = os.fdopen(pipein, 'rb', buffering=0)

# This program will simply make the output of the other program more readable
def converter(message):
    return message.decode('utf-8').replace("_", " ").replace("-", " ").lower()

while True:
    # Wait until we have a message identifier
    new_data =
    if new_data == b'X':
        # Figure out the length of the message 
        raw_length =
        message =
        # Read and convert the message 
    elif new_data:
        # If we read a single byte that isn't an identifier, something went wrong
        raise Exception('Out of sync!') 

That’s all our conversion server is. It creates a pipe that something else can connect to, and will convert the incoming messages to lower case and replace dashes and underscores with spaces.

So let’s have talk to, but as Bob is a computer program, he sometimes spits out gobilty gook messages that need some help.

import os 

# Connect to the pipe created by
pipename = 'fifo'
pipeout =, os.O_NONBLOCK|os.O_WRONLY) 
p = os.fdopen(pipeout, 'wb', buffering=0, closefd=False)

def write_message(message):
    """Covert a string into our message format and send it"""
    length = '{0:04d}'.format(len(message))

Start up first, then

Alice will print out a pretty message of:  terrible looking machine output

While pipes are super handy for terminal usage and running programs inside each other, it has become uncommon to use named pipes as actual comms between two independent programs. Mainly because of the required setup procedure and non-uniformity across operating systems. This has lead to more modern and cross-compatible being preferred.


  • Fast and Efficient
  • No external services


  • Not cross-platform compatible code
  • Difficult to code well


Another ye olde (yet perfectly valid) way to communicate between programs is to simply create files that each program can interpret. Sometimes it’s as simple as having a lockfile. If the lockfile exists, it can serve as a message to other programs to let it finish before they do something, or even to stop new instances of itself from running. For example, running system updates in most Linux environments will create a lock file to make sure that two different update processes aren’t started at the same time.

It’s possible to take that idea further and share a file or two to actually transfer information. In these examples, the two programs will both work out of the same file, with a lock file for safety. (You can write your own code for file lock control, but I will be using the py-filelock package for brevity.)

There are a lot of possible ways to format the shared file, this example will keep it very basic, giving each command a unique id (used to know if command has been run before or not), then the command, and it’s argument. The same dictionary will also leave room for a result or an error message.

The shared JSON file will have the following format:

  "commands": { 
    "<random uuid>": {
      "command": "add",
      "args": [2,5],
      "result": 7  # "result" field only exists after server has responded
      # if "error" key exists instead of "result", something bad happened

The server, will then keep looping, waiting for the shared file to change. Once it does, Alice will obtain the lock for safety (so there isn’t any corrupt JSON from writing being interrupted), read the file, run the desired command, and save the result.  In a real world scenario the lock would only be obtained during the individual reading and writing phases, so to keep the lock held as short as possible by a single program. But that complicates the code (as then you would have to do a second read before writing and only update the sections you ran, in case there were other ones added) and makes it a bit much for an off the shelf example.

import json
import time
import os

from filelock import FileLock

# Keep track of commands that have been run 
completed = []
# Track file size to know when new commands have been added
last_size = 0

lock_file = "shared.json.lock"
lock = FileLock(lock_file)
shared_file = "shared.json"

# Not totally necessary, but if you ever need to raise exceptions
# it's better to create your own than use the built-ins
class AliceBroke(Exception):
    """Custom exception for catching our own errors"""

# Functions that can be executed by
def adding(a, b):
    if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
        raise AliceBroke('Yeah, we are only going to add numbers')
    return a + b

def multiply(a, b):
    if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
        raise AliceBroke('Yeah, we are only going to multiply numbers')
    return a * b

# Right now just have a way to map the incoming strings from Bob to functions
# This could be expanded to include what arguments it expects for pre-validation
translate = {
    "add": adding,
    "multiply": multiply

def main():
    global last_size, completed
    while True:
        # Poor man's file watcher
        file_size = os.path.getsize(shared_file)
        if file_size == last_size:
            # File hasn't changed
            print(f'File has changed from {last_size} to {file_size} bytes,'
                  f' lets see what they want to do!')
            last_size = file_size

        last_size = os.path.getsize(shared_file)

def run_commands():
    # Grab the lock, once it is acquired open the file
    with lock, open(shared_file, 'r+') as f:
        data = json.load(f)
        # Iterate over the command keys, if we haven't run it yet, do so now
        for name in data['commands']:
            if name not in completed:

                command = data['commands'][name]['command']
                args = data['commands'][name]['args']
                print(f'running command {command} with args {args}')

                    data['commands'][name]['result'] = translate[command](*args)
                except AliceBroke as err:
                    # Arguments weren't the type we were expecting
                    data['commands'][name]['error'] = str(err)
                except TypeError:
                    data['commands'][name]['error'] = "Incorrect number of arguments"
        # As we are writing the data back to the same file that is still
        # open, we need to go back to the begging of it before writing
        json.dump(data, f, indent=2)

if __name__ == '__main__':
    # Create / blank out the shared file
    with open(shared_file, 'w') as f:
        json.dump({"commands": {}}, f)
    last_size = os.path.getsize(shared_file)

        # Be nice and clean up after ourselves

Our client, will ask some simple commands that Alice supports and wait until it gets the answer back.

import json
import time
import uuid

from filelock import FileLock

completed = []
last_size = 0

lock = FileLock("shared.json.lock")
shared_file = "shared.json"

def ask_alice(command, *args, wait_for_answer=True):
    # Create a new random ID for the command
    # could be as simple as incremented numbers 
    cmd_uuid = str(uuid.uuid4())
    with lock, open(shared_file, "r+") as f:
        data = json.load(f)
        data['commands'][cmd_uuid] = {'command': command, 'args': args}
        json.dump(data, f)
    if wait_for_answer:
        return get_answer(cmd_uuid)
    return cmd_uuid

def get_answer(cmd_uuid):
    # Wait until we get an answer back for a command
    # Ideally this would be more of an asynchronous callback, but there 
    # are plenty of cases where serialized processes like this must happen
    while True:
        with lock, open(shared_file) as f:
            data = json.load(f)
            command = data['commands'][cmd_uuid]
            if 'result' in command:
                return command['result']
            elif 'error' in command:
                raise Exception(command['error'])

print(f"Lets add 2 and 5 {ask_alice('add', 2, 5, wait_for_answer=True)}")

print(f"Lets multiply 8 and 5 {ask_alice('multiply', 2, 5, wait_for_answer=True)}")

print("Lets break it and cause an exception!")
ask_alice('add', 'bad', 'data', wait_for_answer=True)

Start up first, then

Alice will return:

File has changed from 16 to 90 bytes, lets see what they want to do!
running command add with args [2, 5]
File has changed from 103 to 184 bytes, lets see what they want to do!
running command multiply with args [2, 5]
File has changed from 198 to 283 bytes, lets see what they want to do!
running command add with args ['bad', 'data']


Lets add 2 and 5: 7
Lets multiply 8 and 5: 40
Lets break it and cause an exception!
Traceback (most recent call last):
Exception: Yeah, we are only going to add numbers



  • Cross-platform compatible
  • Simple to implement and understand


  • Have to worry about File System security if anything sensitive is being shared
  • Programs now responsible to clean up after themselves


Message Queue

Welcome to the 21st Century, where message queues serve as quick and efficient ways to transfer commands and information. I like to think of them as an always running middlemen that can survive outside your process.

Here you are spoiled for choice with options:  ActiveMQ, ZeroMQ, Redis, RabbitMQSparrowStarlingKestrelAmazon SQSBeanstalkKafkaIronMQ, and POSIX IPC message queue are the ones I know. You can even use NoSQL databases like mongoDB or couchDB in a similar manner, though for simple IPC I suggest against using those.

I suggest looking into RabbitMQ’s tutorials to get a good in-depth look at how you can write code for it (and across multiple different languages). RabbitMQ is cross-platform and even provides Windows binaries directly, unlike some others.

For my own examples we will use Redis, as they have the best summary I can steal quote from their website yet have a surprising lack of Python tutorials.

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, [etc…]

That’s it, it stores data somewhere that is easily access from multiple programs, just what you need for IPC. You can use it as a keystore (for data transfer and storage) or, like I am about to show, a publisher / subscriber model, better for command and control. is our service and simply subscribes to a channel and waits for any updates and will print them to the screen.

import time

import redis

r = redis.StrictRedis()
p = r.pubsub()

def handler(message):

thread = p.run_in_thread(sleep_time=0.001)

    # Runs in the background while your program otherwise operates
    # Shut down cleanly when all done
    thread.stop() is another program just simply pushes a message to that channel

import redis

r = redis.StrictRedis()
p = r.pubsub()

r.publish('my_channel', 'example data'.encode('utf-8'))

Start up first, then will print example data, and can be stopped by pressing CTRL+C


  • Built-in publisher / subscriber methodology, don’t have to do manual checks
  • Built-in background thread running
  • (Can be) Cross-platform compatible
  • State can be saved if a program exits unexpectedly (depending on setup)


  • Requirement for external service
  • More overhead
  • Have to worry about the external service’s security and how you connect to it

Shared Memory

If written correctly, this can one of the fastest way to transfer data or execute tasks between programs, but it is also the most low-level, meaning a lot more coding and error handling yourself.

Think of using memory in the same way as the single file with using a lock. While one program writes to memory, the other has to wait until it finishes, then can read it and write its own thing back. (It’s possible to also use multiple sections of memory, just it gets to be a lot of example code really fast so I am holding off.)

So now you have to create the Semaphore (basically a memory lockfile) and mapped memory that each program can access.

On Linux (and Windows with Cywin) you can use posix_ipc, and check out their example here.

A lot simpler is to just use the built in mmap directly and share a file in memory. This example won’t even go as far as creating the lockfile,  just showing off the basics only. is going to create and write stuff to the memory mapped file

import time
import mmap
import os

# Create the file and fill it with line ends
fd ='mmaptest', os.O_CREAT | os.O_TRUNC | os.O_RDWR)
os.write(fd, b'\n' * mmap.PAGESIZE)

# Map it to memory and write some data to it, then change it 10 seconds later
buf = mmap.mmap(fd, mmap.PAGESIZE, access=mmap.ACCESS_WRITE)
buf.write(b'now we are in memory\n')
buf.write(b'again\n') will simply read the content of the memory mapped file. Showing that it does know when the contents change.

import mmap
import os
import time

# Open the file for reading only
fd ='mmaptest', os.O_RDONLY)
buf = mmap.mmap(fd, mmap.PAGESIZE, access=mmap.ACCESS_READ)

# Print when the content changes
last = b''
while True:
    msg = buf.readline()
    if msg != last:
        last = msg

This example isn’t the most friendly to run, as you have to start-up and then within ten seconds after that, but it shows the basics of how memory mapping is very similar to just using a file.


  • Fast
  • Cross-platform compatible code


  • Can be difficult to write
  • Limited size to accessible memory
  • Using mmap without posix_ipc will also create a physical file with the same content



On Linux? Don’t want to send information, just need to toggle state on something? Send a signal!

Imagine you have a service, Alice, that anything local can connect to, but you want to be able to tell them if the service goes down or comes back up.

In your clients code, they should register their process identification number with Alice when they first connect to her, and have a method to capture a custom signal to know Alice‘s current state.

import signal, os

service_running = True

my_pid = os.getpid()

# 'Touch' a file as the name of the program's PID 
# in Alice service's special directory
# Make sure to delete this file when your program exists!
open(f"/etc/my_service/pids/{my_pid}", "w").close()

def service_off(signum, frame):
    global service_running 
    service_running = False

def service_on(signum, frame):
    global service_running 
    service_running = True

signal.signal(signal.SIGUSR1, service_off)
signal.signal(signal.SIGUSR2, service_on)

SIGUSR1 is a custom signal reserved for custom use, as well as SIGUSR2, so they are safe to use in this manner without fear of secondary actions happening. (For example, if you send something like SIGINT , aka interrupt, it will just kill your program if not caught properly.) will then simply go through the directory of PID files when it starts up or shuts down, and sends each one that signal to let them know she’s back online.

import os
import signal 

def let_them_know(startup=True): 
   signal_to_send = signal.SIGUSR1 if startup else signal.SIGUSR2
    for pid_file in os.listdir(f"/etc/my_service/pids/"):
        # Put in try catch block for production
        os.kill(int(pid_file), signal_to_send) 


  • Super simple


  • Not cross-platform compatible
  • Cannot transfer data


The traditional IPC. Every time I look for different IPC methods, sockets always come up. It’s easy to understand why, they are cross platform and natively supported by most languages.

However, dealing directly with raw sockets is very low-level and require a lot more coding and configuration. The Python standard library has a great example of an echo server to get you started.

But this is batteries included, everyone-else-has-already-done-the-work-for-you Python. Sure you could set up everything yourself, or you can use the higher level Listeners and Clients from the multiprocessing library. is hosting a server party, and executing incoming requests.

from multiprocessing.connection import Listener

def function_to_execute(*args):
    """" Our handler function that will run 
         when an incoming request is a list of arguments
    return args[0] * args[1]

with Listener(('localhost', 6000), authkey=b'Do not let eve find us!') as listener:
    # Looping here so that the clients / party goers can 
    # always come back for more than a single request
    while True:
        print('waiting for someone to ask for something')
        with listener.accept() as conn:
            args = conn.recv()
            if args == b'stop server':
            elif isinstance(args, list):  
                # Very basic check, must be more secure in production
                print('Someone wants me to do something')
                result = function_to_execute(*args)
                conn.send(b'I have no idea what you want me to do') is going to go for just a quick function and then call it a night.

from multiprocessing.connection import Client

with Client(('localhost', 6000), authkey=b'Do not let eve find us!') as conn:
    conn.send([8, 8])
    print(f"What is 8 * 8? Why Alice told me it's {conn.recv()}")

# We have to connect again because Alice can only handle one request at a time
with Client(('localhost', 6000), authkey=b'Do not let eve find us!') as conn:
    # Bob's a party pooper and going to end it for everyone
    conn.send(b'stop server')

Start up first, then run Bob will quickly exit with the message What is 8 * 8, Why Alice told me it's 64

Alice will have four total messages:

waiting for someone to ask for something
Someone wants me to do something
waiting for someone to ask for something


  • Cross-platform compatible
  • Can be extended to RPC


  • Slower
  • More overhead



Believe it or not, you can use a lot of the methods from remote procedure calls locally. Even sockets and message queues can already be set up to work for either IPC or RPC.

Now, barring using IR receivers and transmitters or laser communications, you are probably going to connect remote programs via the internet. That means most RPC standards are going to be networking based, aka on sockets. So it’s less about deciding which protocol to use, and more on choosing which data transmission standard to use. Two that I have used are JSONRPC, for normal humans, and XMLRPC, for if you are a XML fan that like sniffing glue and running with scissors 1. There is also SOAP, for XML fans who need their acronyms to also spell something, and Apache Thrift that I found while doing research for this article that I have not touched. Those standards transfer data as text, which makes it easier to read, but inefficient. Newer options, like gRPC use protocol buffers to serialize the data and reduce overhead.

It’s also very common and easy to just write up a simple HTTP REST interface for you programs, using a lightweight framework like bottle or flask, and communicate that way.

In the future (if they don’t already exist) I expect to see even more choices with WebSocket or WebRTC based communications.



Lets wrap this all up with a comparison table:

MethodCross Platform CompatibleRequires File or File DescriptorRequires External ServiceEasy to write / read code 5
Message QueuesYes4NoYesYes
Shared MemoryYes3Yes2NoNo

Hopefully that at least clears up why Sockets are the go-to IPC method, as they have the most favorable traits. Whereas for me, I usually want something either a little more robust, like message queues, or REST APIs so that it can be used locally or remotely. Which, to be fair, are built on top of sockets.