Opinion

A subject that is in debated more than standardized

Let it Djan-go!

https://www.claragriffith.com/
Artwork by Clara Griffith

Do you know someone who hails from the strange land of Django development? Where there are code based settings files, is customary to use a ORM for all database calls, and classes reign supreme. It’s a dangerous place that sits perilously close to the Javalands. Most who originally traveled there were told there would be riches, but they lost themselves along the way and simply need our help to come home again.

Classless classes

The first bad habit that must be broken is the horrendous overuse of classes. While classes are considered the building blocks of most object oriented languages, Python has the power of first-class functions and a clean namespacing system which negates many the usual reasons for classes.

When using Django’s framework, where a lot of its powerful operations are performed by subclassing, classes actually makes sense. When creating a small class for a one use instance in a library or script it’s most likely simply over complicating and slowing down your code.

Let’s use an example I have seen recently, where someone wanted to run a command, but needed to know if a custom shell wrapper existed before running the command.

import subprocess
import os

class DoIt:

    def __init__(self, command):
        self.command = command
        self.shell = "/bin/secure_shell" if os.path.exists("/bin/secure_shell") else "/bin/bash"

    def execute(self):
        return subprocess.run([self.shell, "-c", self.command], stdout=subprocess.PIPE)

runner = DoIt('echo "order 66" ')
runner.execute() 

Most of this class is setup and unnecessary. Especially if the state of the system won’t change while the script is running (i.e. if “/bin/secure_shell” is added or removed during execution). Let’s clean this up a little.

import subprocess
import os

shell = "/bin/secure_shell" if os.path.exists("/bin/secure_shell") else "/bin/bash"


def do_it(command):
    return subprocess.run([shell, "-c", command], stdout=subprocess.PIPE)

do_it('echo "order 66" ')

Shorter, cleaner, faster, easier to read. It doesn’t get much better than that. Now if the system state may change, you would need to throw the shell check inside the command itself, but is also possible.

To learn more view the great PyCon video entitled “stop writing classes.” The video goes over more details of why classes can stink up your code.

Help me ORM-Kenobi!

Now onto probably the hardest change that will have the most significant impact. If you or someone you know has mesothelioma learned how to interact databases by using an ORM (Django’s Models) and don’t know how to hand write SQL queries, it’s time to make an effort to learn.

An ORM is always going to be slower than its raw SQL equivalent, and in a many cases it will be much much slower. ORMs are brilliant, but still can’t always optimize queries. Even if they do, they then load the retrieved data into Python objects which takes longer and uses more memory. Their abstraction also comes at the price of not always being clear on what is actually happening under the hood.

I am not suggesting never using an ORM. They have a lot of benefits for simple datasets such as ease of development, cross database compatibility, custom field transforms and so on. However if you don’t know SQL to begin with, your ORM will most likely be extremely inefficient, under-powered and the SQL tables will be non-normalized.

For example, one of the worst sins in SQL is to use SELECT * in production. Which is the default behavior for ORM queries. In fact, the Django main query page doesn’t even go over how to limit columns returned!

Now, I am not going to tell when to use an ORM or not, or link to a dozen other articles that fight over this point further. All I want is for everyone that thinks they should use an ORM to at least sit down and learn a moderate amount of SQL before making that decision.

Text Config is King

When was the last time you had to convince an IT admin to update a source code file? Yeah, good luck with that.

By separate settings from code you gain a lot of flexibility and safety. It’s the difference between “Mr. CEO, you need the default from email changed? So for our Django docker deploy we’ll need to make the change in code, commit it to the feature branch, PR it to develop, pray we don’t have other changes and require a hotfix for this instead, then PR again for release to master, wait for the CI/CD to finish and then the auto-deployment should be good by end of the day.” vs. “One sec, let me update the config file and restart it. Okay, good to go.”

“Code based setting files are more powerful!” – Old Sith Proverb

That’s the path to the dark side. If you think you need programmatic help in a configuration file, your logic is in the wrong place. A config file shouldn’t validate itself, not have if statements, nor have to find the absolute path to the relative directory structured the user entered.

The best configuration handling I have seen are text files that are read, sanity checked, loaded into memory and have paths extrapolated as needed, run any necessary safety checks, then any additional logic run. This allows almost anyone to update a setting safely and quickly.

The scary Django way

Lets take a chunk from the “one django settings.py file to rule them all”

import os 
import json 

DEBUG = os.environ.get("APP_DEBUG") == "1"

ALLOWED_HOSTS = os.environ \
  .get("APP_ALLOWED_HOSTS", "localhost") \
  .split(",")

DEFAULT_DATABASES = {...}
DATABASES = (
  json.loads(os.environ["APP_DATABASES"])
  if "APP_DATABASES" in os.environ
  else DEFAULT_DATABASES
)

The basic design is to allow for overrides on runtime based on environment variables, which is very common and good practice. However, the way this example has been implemented honestly scares me. If the json loads fails, you are going to have an exception raised, in your freaking settings file. Which may even happen before logging is setup, project dependent.

Second, say you have “APP_ALLOWED_HOSTS” (or worse “APP_DATABASES”) set to an empty string accidentally. Then the “ALLOWED_HOSTS” would be set to [""] !

Finally spending all that work to grab environment variables makes it very unclear where to actually update the settings. For example, to update the ALLOWED_HOSTS you would have to change the default in the os.environ.get which wouldn’t make much sense to someone who didn’t know Python.

The safe way

Why not a much more easily readable file, in this case a YAML file paired with python-box (lots of other options as well, this was simply fast to write and easy to digest.)

# config.yaml
---
debug: false
allowed_hosts:
  - 'localhost'
databases:
  - 'sqlite3'

Wonderful, now we have a super easy to digest config file that would be easy to update. Let’s implement the parser for it.

import os
from box import Box

config = Box.from_yaml(filename="config.yaml")

config.debug |= os.getenv("DEBUG") == "1"

# Doing it this way will make sure it exists and is not empty
if os.getenv("ALLOWED_HOSTS"):
    # This is overwritten like in settings.py above instead of extended 
    config.allowed_hosts = os.getenv("ALLOWED_HOSTS").split(",")

if os.getenv("APP_DATABASES"):
    try:
        databases = Box.from_json(filename=os.environ["APP_DATABASES"])
    except Exception as err:
        # Can add proper logging or print message as well / instead of exception
        raise Exception(f"Could not load database set from {os.environ['APP_DATABASES']}: {err}")
    else:
        config.databases.extend(databases)

Tada! Now this is a lot easier to understand the logic that is happening, much easier to update the config file, as well as a crazy amount safer.

Now the only thing to argue about with your coworkers is what type of config file to use. cfg is old-school, json doesn’t support comments, yaml is slow, toml is fake ini, hocon is overcomplicated, and you should probably just stop coding if you’re even considering xml. But no matter which you go with, they are all better (except xml) than using a language based settings file.

Wrap Up

So you’ve heard me prattle on and on of the dangers Djangoers might bring with them to your next project, but also keep in mind the good things they will bring. Django helps people learn how to document well, keep a rigid project structure and has a very warm and supportive community that we should all strive to match.

Thanks for reading, and I hope you enjoyed and found this informative or at least amusing!

How NOT to call Robinhood’s secret API with Python

This article is meant to go over the basics of calling an API via Python, as well as a critique of a top google result I recently ran across. We will start by using Robinhood’s undocumented API, as I saw this article that had python code so bad I had to try and fix it.

Also, because I can get us each a free stock if you sign up with this Robinhood referral link (seems to be usually around a $10 stock, but they like to advertise you might get Facebook or Visa.)

Murder – Artwork by Clara Griffith

So at the base of the bad example (non-working call), he was using curl calls like:

curl -v https://api.robinhood.com/quotes/XIV/ -H "Accept: application/json"

Which is a great use of curl. However, then the author simply decided to wrap it up in a subprocess call and manually parsed the output. He then states “You could also use PYCurl but I didn’t feel like learning it.” Well, thankfully there is no need to learn it.

How to interact with a REST API, the basic GET and POST

The most common text based web APIs are JSON REST-like nowadays. I add the “like” because there are usually cases a company does not fully stick to the standard, and to be honest is a good decision a lot of the time. Thankfully that doesn’t usually change the behavior of the most simple calls of GET and POST.

A properly implemented GET call on the server’s side won’t change anything on their server. So they are best to practice with. It is possible to do this all with pure python, however the requests library is so well known and has exactly what we want already, so we will stick to that library for the examples. To install it, simply pip install requests.

Example GET call

import requests

response = requests.get('https://api.robinhood.com')
# Check to make sure we got 'good' response, aka in HTTP code 2XX range
if response.ok: 
    # Display the result of the parsed json
    print(response.json())

This should simply print out {}.

Next I thought it would be a good idea to try and log in using that article’s example login. If you don’t have an account (not necessary), you can sign up here which uses a referral link, gives you and me both a free random stock.

Example (Non-working) POST call

import requests

response = requests.post('https://api.robinhood.com/api-token-auth/', 
                         json={'username': YOUR_USERNAME, 'password': YOUR_PASSWORD})
if response.ok:
    print(response.json())
else:
    print(response.status_code)
    print(response.text)

Now wait a minute, we are getting an error.

404
<h1>Not Found</h1><p>The requested resource was not found on this server.</p>

Oh gosh darn it. Not only did the author have bad python code, they have outdated examples! How dare they! (*quickly scans my most recent articles for anything obviously outdated*)

Now we get a quick lesson on the dangers of using undocumented APIs. They have absolutely no guarantees. Anything can change at anytime or even remove access to them. In this case, some people reversed engineered how to login to robinhood using a different endpoint, and have an unofficial API wrapped around it.

So for now, lets use Postman’s excellent echo API that will return what you send it. This time, lets put it in a more reusable function.

Example (good) POST call

import requests

api_root = 'https://postman-echo.com'

def post(path='/post', payload=None, timeout=10, **request_args):
    response = requests.post(f'{api_root}/{path}', 
                   json=payload, timeout=timeout, **request_args)
    if response.ok:
            return response.json()
    raise Exception(f'error while calling "{path}", returned status "{response.status_code}" with message: "{response.text}"')

print(post(payload={'username': 'chris'}))

The endpoint will return a JSON object, and requests will automatically translate that into a python dictionary when called with .json() on the response.

{'args': {}, 'data': {'username': 'chris'}, 'files': {}, 'form': {},  'json': {'username': 'chris'}, 'url': 'https://postman-echo.com/post'}

Lot of fluff, but it is nice that Postman’s API spells out exactly what that endpoint receives. For example, you can test sending form data by changing the post parameter call from json to data.

Now this is doing a better job of checking response and using much more pythonic methods. But let’s take it one step further and do better error catching. Because if this is something you are putting into a bigger program, you probably have custom exceptions and want to be able to catch them in a certain way.

This is going to be a little exception heavy, but is showcasing all the ways you might need to deal with certain errors. I would probably only have one wrapper around the call and the parsing JSON personally, but it is really program dependent.

import requests

api_root = 'https://postman-echo.com'

class MyError(Exception):
    """Exception for anything in my program"""

class MyConnectionIssue(MyError):
    """child exception for issues with connections"""


def post(path='/post', payload=None, timeout=10, **request_args):
    try:
        response = requests.post(f'{api_root}/{path}', json=payload, timeout=timeout, **request_args)
    except requests.Timeout:
        raise MyConnectionIssue(f'Timeout of {timeout} seconds reached while calling "{path}"') from None
    except requests.RequestException as err:
        raise MyConnectionIssue(f'Error while calling {path}: {err}')
    else:
        if response.ok:
            try:
                return response.json()
            except ValueError:
                raise MyError(f'Expected output from {path} was not in JSON format. Output: {response.text}')
        raise MyError(f'error while calling "{path}", returned status "{response.status_code}" with message: "{response.text}"')

print(post(payload={'username': 'chris'}))

If you’re a little rusty on exceptions, feel free to brush up on them here.

How NOT to parse JSON

In the terrible code below, we see the other article’s code manually parsing JSON output. Please for the love of all that is holy do not copy!

import subprocess

class data():
    def __init__(self, stock):
            parameter_list = ['open', 'high', 'low', 'volume', 'average_volume', 'last_trade_price', 'previous_close']

            # replacing CURL call
            out = '{"open": 54 "high": 55 "low": 52 "volume": 6000 "average_volume": 6000 "last_trade_price": 52 "previous_close": 54 "}'
            # back to copied code

            string = out  
            for p in parameter_list:  
                parameter = p  
                if parameter in string:  
                    x = 0  
                    output = ''  
                    iteration = 0  
                    for i in string:  
                        if i != parameter[x]:  
                            x = 0  
                        if i == parameter[x]:  
                            x = x+1  
                        if x == len(parameter):  
                            eowPosition = iteration  
                            break  
                        iteration = iteration + 1

                    target_position = eowPosition + 4
                    for i in string[target_position:]:  
                        if i == '"':  
                            break  
                        elif i == 'u':
                            output = 'NULL'  
                            break  
                        else:  
                            output = output+i

                    if output != 'NULL':  
                        output = float(output)  
                    if p == 'open':  
                        self.open = output  
                    if p == 'high':  
                        self.high = output  
                    if p == 'low':  
                        self.low = output  
                    if p == 'volume':  
                        self.volume = output  
                    if p == 'average_volume':  
                        self.average_volume = output  
                    if p == 'last_trade_price':  
                        self.current = output  
                    if p == 'previous_close':  
                        self.close = output  
XIV = data('XIV')

print('Current price: ', XIV.current)

So let’s do a critique starting from the top. class data():
First, classes should be CamelCase, second, this is a single function, shouldn’t even be a class. However does make a tiny bit of sense considering they are basically using it as a namespace, but there are better ways.

Next is them using subprocess to do a curl command instead of the built in urllib or requests. (Omitted so this code runs.)

Then onto string = out which is just…why? No reason to copy it to a new variable. Just name it what you want in the first place.

Finally the for loop that actually parses the JSON. Which is just painful to look at. If you ever need to parse JSON, without using requests method, just use the built-in json library!

import json

data = json.loads('{"open": 54}')
print(data["open"]) 
# 54

Bam, 42 lines into 3.

Finishing on a high note

Now for as much grief as I am giving this author, I also respect him a lot. He was able to accomplish what he wanted to achieve using Python, a language he had little experience with. To his credit he was clear up front that “I’m not an expert in stock trading nor coding so I can’t say the code is the cleanest”.

He also was willing to write up an article on how to do it for others, which most people don’t dream of doing, so in all earnestness, good job Spencer!

Last mention, free stock (legit actual money) from robinhood, just use this referral!

Top 10ish Python standard library modules

When interviewing Python programming candidates, my wife always likes to ask the simple question, “can you name ten Python standard library modules?” This is harder than most think, as many people will completely blank out and others will be dead wrong. “Requests?” one poor soul answered. It’s a good interview question, as it gives insight onto what people are familiar with and may use regularly. So I sat down and though of of which ones I use and enjoy the most. So here are my top ten(ish) useful, favorite and unordered standard modules.

pathlib

Back in the dark days, you would have to store your path as a string, and call obscure functions under os.path to figure anything out about it. Pathlib removes the headache.

from pathlib import Path

my_path = Path('text_file.txt')
if not my_path.exists():
    my_path.write_text('File Content')
assert my_path.exists()
assert my_path.is_file()

Read more at the pathlib python docs.

tempfile

There are a boatload of uses for a temporary file or directory. Hence why it’s in the standard library. I find myself using them together, inside context managers more often than not.

from pathlib import Path
from tempfile import TemporaryDirectory, TemporaryFile


with TemporaryDirectory(prefix='Code_', suffix='_Calamity') as temp_dir:
    with TemporaryFile(dir=temp_dir) as temp_file:
        temp_file.write(b'Test')

        temp_file_path = Path(temp_file.name)
        assert temp_file_path.exists()

# Make sure file only exists within the context
assert not temp_file_path.exists()

I usually end up using this when a tool or library wants to work with a file rather than standard input, so short lived files in a context manager make life a lot easier. Tempfile python docs.

subprocess

Python is pretty amazing, but sometimes you do need to call other programs. Subprocess makes it easy to execute and interact with other executable across operating systems. Check out my other post on it!

from subprocess import run, PIPE
 
response = run("echo 'Join the Dark Side!'", shell=True, stdout=PIPE)

print(response.stdout.decode('utf-8'))
# 'Join the Dark Side!'

logging

This is probably the most useful built in library for debugging there is, and I see it either unused or misused more than anything else.

import logging
import sys
 
logger = logging.getLogger(__name__)
my_stream = logging.StreamHandler(stream=sys.stdout)
my_stream.setLevel(logging.DEBUG)
my_stream.setFormatter(
    logging.Formatter("%(asctime)s - %(name)-12s  "
                      "%(levelname)-8s %(message)s"))
logger.addHandler(my_stream)
logger.setLevel(logging.DEBUG)
 
logger.info("We the people")

If you haven’t already, go on your first date with python logging! It’s also possible to put all the configuration details into a separate ini or json file, learn more from the logging python docs.

threading and multithreading

Two very different things for widely different uses, but they have very similar interfaces and easy to talk about at the same time. Quick and dirty difference: Use threading for IO heavy tasks (writing to files, reading websites, etc) and multithreading for CPU heavy tasks.

from multiprocessing.pool import ThreadPool, Pool
 
def square_it(x):
    return x*x
 
# On Windows, make sure that multiprocessing doesn't start
# until after "if __name__ == '__main__'" 
 
# Pool and ThreadPool are interchangable in this example 
with Pool(processes=5) as pool:
   results = pool.map(square_it, [5, 4, 3, 2 ,1])
 
print(results) 
# [25, 16, 9, 4, 1]

I did a post on ThreadPools and Multithreading Pools, as I find them the easiest way to work with (multi)threading in Python.

os and sys

The Python world would not exist if we didn’t have all the power and functionality these built-ins bring. You really haven’t coded in Python if you haven’t used these yet, so I won’t even bother elaborating them here.

random and uuid

Maybe you’re making a game…

import random
random.choice(['Sneak Attack', 'High Kick', 'Low Kick'])

Or debugging a webserver…

from uuid import uuid4

# Bad example, but not writing out a whole webserver to prove a point
def get(*args):
request = uuid4()
logger.info(f'Request {request} called with args {args}')

Or turning your webserver into your own type of game…

if user_name == 'My Boss':
    time.sleep(random.randint(1, 5))

No matter which you are doing, it’s always handy to have randomly generated or unique numbers.

socket

The internet runs because of sockets. Modern technology exists because of sockets. Sockets are life, sockets are….annoying low level at times but its good to know the basics so you can appreciate everything written on top of them.

Thankfully the Python docs have good examples of them in use.

hashlib

Need to check a file’s integrity? Hashlib is there with md5 and sha hashes! I created a reusable function to easily reference when I need to do it.

Need to securly store people’s password hashes for a website? Hashlib now has scrypt support! Heck, here is my own function I always use to generate the scrypt hashes.

from collections import namedtuple
import hashlib
import os


Hashed = namedtuple('Hashed', ['hash', 'salt', 'n', 'r', 'p', 'key_length'])


def secure_hash(value: bytes, salt: bytes = None, key_length: int = 128, n: int = 2 ** 16, r: int = 8, p: int = 1):
    maxmem = n * r * 2 * key_length
    salt = salt or os.urandom(16)
    hashed = hashlib.scrypt(value, salt=salt, n=n, r=r, p=p, maxmem=maxmem, dklen=key_length)
    return Hashed(hash=hashed.hex(), salt=salt.hex(), n=n, r=r, p=p, key_length=key_length)

venv

You probably only think of it as a command when you run python -m venv python_virtual_env to create your environments, but it’s run that way because it’s a standard library. Every new project you start or Python program you install should be using this library, so it is used a lot!

Summary

There ya go, 10 or so can’t live without standard libraries! Isn’t it so nice that Python comes “batteries included”?

Is the world ready for Python 3?

The trek from Python 2 to Python 3 has been drawn-out, arduous and fraught with perils. How close are our dear Knights developers to all reaching the long sought glory of Python 3?

Quest for the Python 3 – Artwork by Clara Griffith (Link may contain NSFW art)

PIP Downloads

Let’s first jump into what is being used the most currently. This data examines fifteen different libraries downloaded via PIP for a particular Python version. We are only including 2.7 and 3.4+, the Python Versions that are currently supported.

The libraries analyzed are ones that have over 10K stars on github and have been downloaded via PIP. The contenders are: celery, django, flask, ipython, keras, mitmproxy, numpy, pandas, python-box, requests, scrapy, selenium, tensorflow, and tornado. (To be fair, numpy and python-box didn’t have 10K stars, but I used them in the script to make these graphics, so gave them some spotlight too.)

As of January 2019, Python 3 downloads are eclipsing Python 2 by over 20% with Python 3.6 bringing over 39% of it, almost directly matching Python 2.7’s total.

That is good, but not great news. Thankfully Python 2 won’t just stop working at the end of this year, but those are rookie Python 3 numbers, we got to pump them up!

Of course, we have to remember this is a small subset of all downloads. Subsequently, pip downloads themselves don’t tell the whole tale, but this does give us an idea of how things of are going.

This is accomplished by using the PyPI BigQuery data and some SQL (adapted from Artem Golubin’s post about this from last year), then throwing it into matplotlib.

SELECT
  SUBSTR(details.python, 0, 3) as python_version,
  COUNT(*) as download_count
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -30, "DAY"),
    CURRENT_TIMESTAMP()
  )
WHERE
 details.installer.name='pip' and
 file.project = 'requests' -- change project name here
GROUP BY
  python_version
ORDER BY
  download_count DESC
LIMIT 100

Library Brawl: Who’s the Python 3 champs?

In this head to head, we are going to compare two similar libraries, and see how they are doing on the switch to Python 3.

Web Frameworks

The first two up are very popular web frameworks to develop in, Flask and Django.

It’s a dead heat! Both libraries are doing well at attracting developers with a fresh mindset.

Machine Learning

The most popular github package by far was tensorflow with over a hundred thousand stars. Here it’s paired against it’s younger brother keras, which actually depends on it (or other AI tools) to operate.

Machine learning needs to teach it’s developers how to update! It’s a sad day for AI.

Hacker vs Web Scraper

Okay, not really directly comparable tools with a man-in-the-middle proxy and a web scraper, but it’s still an interesting match up.

With this duo I was surprised they didn’t have a higher correlation. I was honestly expecting the mitm tool to have less Python 3 love, as a lot of “hacker” tools depend on the broken way Python 2 handles strings vs unicode, thus are hard to update.

Good job hackers, always keep your tool belt fresh! Scrapers….scrape it together.

Data Science

The last head to head is for the data scientists out there, and you got science in your name and numbers in your veins, you should be at the bleeding edge of tech!

Ouch, yinz need to get with the times.

Python Version Developers Use More Often

This is some hard to gather data as an individual, so I’m going to have to cheat and just base this information off JetBrain’s yearly state of the ecosystem reports from 2017 and 2018.

In 2017, 53% of devs reported using Python 3 as their main language, which went up 22% in 2018 to 75%. Based on those two points of data, we can come to a crystal clear, no doubt conclusion to how many developers will be using Python 3 as their main language in 2019.

That’s right, based on the past two year trend, 97% of developers should be using Python 3 in 2019.

Okay, well, maybe not. But I personal expect that number to be over 90% by the time Python 2 is EOL, which is excellent news.

Operating System Default Language

OSes have a fun time of being in the cross hairs of everyone from desktop to server users, trying to figure out the right combo of what’s best for their users and for their own technology stack going forward. Every major Linux distribution agrees Python 3 is the way to the future and they will need to change over. The hard part is deciding when it will impact the users least and best for their own release cycle. This has caused lots of headaches over the years. So where do we stand now?

OSPython Version
Windows 10None
OSX 10.82.7
Debian 92.7
RedHat 8*3.6
Fedora 293.7
Ubuntu 19.04*3.7

(* denotes upcoming releases this year)

Windows has the easy stance of just saying “do it yourself” and Mac is, as usual, not bothering to innovate and just hum along until it breaks. Thankfully most Linux distros, which power the internet, are either already updated or updating this year. I haven’t seen for sure that Debian 10 will be released with Python 3 or that it w ill be out before year’s end, but I would be surprised if either were not true. Then there’s Arch linux. Arch has had Python 3 as the standard for almost as long as it existed, good boy!

Are we ready?

In all honesty, we are. We are far more prepared for this than the financial sector was ready for Y2K, and we all survived that. Moreover, there are always going to be code bases that can’t update to the latest version easily, but that’s true across the entire software development world. That and the fact the Python Software Foundation has given an extended eleven years which has allowed for even the slowest of companies to have ample time to migrate to Python 3.

Python 3 everywhere? Bring it on!


Stop using plus signs to concatenate strings!

In Python, using plus signs to concatenate strings together is one of the first things you learn, i.e. print("hello" + "world!"), and it should be one of the first things you stop using. Using plus signs to “add” strings together is inherently more error prone, messier and unprofessional. Instead you should be using .format() or f-strings.

Hunter – Artwork by Clara Griffith

Before diving into what’s really wrong with + plus sign concatenation, we are going to take a quick step back and look at the possible different ways to merge strings together in Python, so we can get a better understanding of where to use what.

Concatenating strings

When to useWhen to avoid
+NeverAlways
%Legacy code, logging modulePython 3+
formatEverywhere
f-stringPython 3.6+When you need to escape characters inside the {}s
joinOn an iterable (list, tuple, etc) of strings

Here is a quick demo of each of those methods in action using the same tuple of strings. For an already existing iterate of strings, join makes the most sense if you want them to have the same character(s) between all of them. However, in most other cases join won’t be applicable so we are going to ignore it for the rest of this post.

variables = ("these", "are", "strings")

print(" ".join(variables))
print("%s %s %s" % variables)
print("{} {} {}".format(*variables))
print(f"{variables[0]} {variables[1]} {variables[2]}")
print(variables[0] + " " + variables[1] + " " + variables[2])

# They all print "these are strings"

In many cases you will have other words or strings not in the same structure you will be concatenating together, so even though something like f-strings here looks more cumbersome than the others, it wins out in simplicity in other scenarios. I honestly use f-strings more than anything else, but .format does have advantages we will look at later. Anyways, back to why using plus signs with strings is bad.

Errors lurking in the shadows

Consider the following code, which has four different perfectly working examples of string concatenation.

wait_time = "0.1"
time_amount = "seconds"

print("We are going to wait {} {}".format(wait_time, time_amount))

print(f"We are going to wait {wait_time} {time_amount}")

print("We are going to wait %s %s" % (wait_time, time_amount))

print("We are going to wait " + wait_time + " " + time_amount)

# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds

Everything works as expected, but wait, if we are going to put a time.sleep in there, it takes the wait time as a float. Let’s update that and add the sleep.

Concatenation TypeErrors

import time

wait_time = 0.1 # Changed from string to float
time_amount = "seconds"

print("We are going to wait {} {}".format(wait_time, time_amount))

print(f"We are going to wait {wait_time} {time_amount}")

print("We are going to wait %s %s" % (wait_time, time_amount))

print("We are going to wait " + wait_time + " " + time_amount)

time.sleep(wait_time)

print("All done!")


# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds
# We are going to wait 0.1 seconds
# Traceback (most recent call last):
#    print("We are going to wait " + wait_time + " " + time_amount)
# TypeError: can only concatenate str (not "float") to str

That’s right, the only method of string concatenation to break our code was using + plus signs. Now here it was very obvious it was going to happen. But what about going back to your code a few weeks or months later? Or even worse, if you are using someone else’s code as a library and they do this. It can become quite an avoidable headache.

Formatting issues

Another common issue that you will run into frequently using plus signs is unclear formatting. It’s very easy to forget to add white space around variables when you aren’t using a single string with replace characters like every other method. What can look very similar will yield two different results:

print(f"{wait_time} {time_amount}")
print(wait_time + time_amount)

# 0.1 seconds
# 0.1seconds

Did you even notice we had that issue in the very first paragraph’s code? print("hello" + "world!")

Messy

This is the most subjective of my reasons to avoid it, but I personally think it becomes very unreadable compared to any other methods, as shown with the following example.

mixed_type_vars = {
    "a": "My",
    "b": 2056,
    "c": "bodyguards",
    "d": {"have": "feelings"}
}


def plus_string(variables):
    return variables["a"] + " " + str(variables["b"]) + \
           " " + variables["c"] + " " + str(variables["d"])


def format_string(variables):
    return "{a} {b} {c} {d}".format(**variables)


def percent_string(variables):
    return "%s %d %s %s" % (variables["a"], variables["b"], 
                            variables["c"], variables["d"])

print(plus_string(mixed_type_vars))
print(format_string(mixed_type_vars))
print(percent_string(mixed_type_vars))

String format is very powerful because it is a function, and can take positional or keyword args and replace them as such in the string. In the example above .format(**variables) is equivalent to

.format(a="My", b=2056, c="bodyguards", d={"have": "feelings"})

That way in the string you can reference them by their keywords (in this case single characters a through d).

"Thing string is {opinion} formatted".format(opinion="very nicely")

Which means with format you have a lot of options to make the string a lot more readable, or you can reuse positional or named variables easily.

print("{0} is not {1} but it is {0} just like "
      "{fruit} is not a {vegetable} but is a {fruit}"
      "".format(1, 2, fruit="apple", vegetable="potato"))

Slower string conversion

Using the functions from the Messy section we can see that it is also slower when concatenation a mix of types.

import timeit
plus = timeit.timeit('plus_string(mixed_type_vars)',
                     number=1000000,
                     setup='from __main__ import mixed_type_vars, plus_string')

form = timeit.timeit('format_string(mixed_type_vars)',
                     number=1000000,
                     setup='from __main__ import mixed_type_vars, format_string')

percent = timeit.timeit('percent_string(mixed_type_vars)',
                     number=1000000,
                     setup='from __main__ import mixed_type_vars, percent_string')

print("Concatenating a mix of types into a string one million times:")
print(f"{plus:.04f} seconds - plus signs")
print(f"{form:.04f} seconds - string format")
print(f"{percent:.04f} seconds - percent signs")

# Concatenating a mix of types into a string one million times:
# 1.9958 seconds - plus signs
# 1.3123 seconds - string format
# 1.0439 seconds - percent signs

On my machine, percent signs were slightly faster than string format, but both smoked using plus signs and explicit conversion.

Unprofessional

This isn’t only something to call out teammates on during code review, but can even negatively impact you if you’re applying for Python jobs. Using “+” everywhere for strings is a red flag that you are still a novice. I don’t know anyone personally that has been turned away because of something so trivial, but it does show that you unfamiliar with Python’s awesome feature rich strings and haven’t had a lot of experience in group coding.

If you ever saw Batman or James Bond coding in Python, they wouldn’t be using +s in their string concatenation, and nor should you!

Summary

"If" + "👏" + "you" + "👏" +"use" + "👏" + "plus signs" + "👏" + "to" + 
"👏" + "concatenate"  + "👏" + "your"  + "👏" + "strings"  + "👏" + "you" 
 + "👏" + "are"  + "👏" + "more"  + "👏" + "annoying"  + "👏" + "than"  + 
"👏" + "this"  + "👏" + "meme!"