orm

Let it Djan-go!

https://www.claragriffith.com/
Artwork by Clara Griffith

Do you know someone who hails from the strange land of Django development? Where there are code based settings files, is customary to use a ORM for all database calls, and classes reign supreme. It’s a dangerous place that sits perilously close to the Javalands. Most who originally traveled there were told there would be riches, but they lost themselves along the way and simply need our help to come home again.

Classless classes

The first bad habit that must be broken is the horrendous overuse of classes. While classes are considered the building blocks of most object oriented languages, Python has the power of first-class functions and a clean namespacing system which negates many the usual reasons for classes.

When using Django’s framework, where a lot of its powerful operations are performed by subclassing, classes actually makes sense. When creating a small class for a one use instance in a library or script it’s most likely simply over complicating and slowing down your code.

Let’s use an example I have seen recently, where someone wanted to run a command, but needed to know if a custom shell wrapper existed before running the command.

import subprocess
import os

class DoIt:

    def __init__(self, command):
        self.command = command
        self.shell = "/bin/secure_shell" if os.path.exists("/bin/secure_shell") else "/bin/bash"

    def execute(self):
        return subprocess.run([self.shell, "-c", self.command], stdout=subprocess.PIPE)

runner = DoIt('echo "order 66" ')
runner.execute() 

Most of this class is setup and unnecessary. Especially if the state of the system won’t change while the script is running (i.e. if “/bin/secure_shell” is added or removed during execution). Let’s clean this up a little.

import subprocess
import os

shell = "/bin/secure_shell" if os.path.exists("/bin/secure_shell") else "/bin/bash"


def do_it(command):
    return subprocess.run([shell, "-c", command], stdout=subprocess.PIPE)

do_it('echo "order 66" ')

Shorter, cleaner, faster, easier to read. It doesn’t get much better than that. Now if the system state may change, you would need to throw the shell check inside the command itself, but is also possible.

To learn more view the great PyCon video entitled “stop writing classes.” The video goes over more details of why classes can stink up your code.

Help me ORM-Kenobi!

Now onto probably the hardest change that will have the most significant impact. If you or someone you know has mesothelioma learned how to interact databases by using an ORM (Django’s Models) and don’t know how to hand write SQL queries, it’s time to make an effort to learn.

An ORM is always going to be slower than its raw SQL equivalent, and in a many cases it will be much much slower. ORMs are brilliant, but still can’t always optimize queries. Even if they do, they then load the retrieved data into Python objects which takes longer and uses more memory. Their abstraction also comes at the price of not always being clear on what is actually happening under the hood.

I am not suggesting never using an ORM. They have a lot of benefits for simple datasets such as ease of development, cross database compatibility, custom field transforms and so on. However if you don’t know SQL to begin with, your ORM will most likely be extremely inefficient, under-powered and the SQL tables will be non-normalized.

For example, one of the worst sins in SQL is to use SELECT * in production. Which is the default behavior for ORM queries. In fact, the Django main query page doesn’t even go over how to limit columns returned!

Now, I am not going to tell when to use an ORM or not, or link to a dozen other articles that fight over this point further. All I want is for everyone that thinks they should use an ORM to at least sit down and learn a moderate amount of SQL before making that decision.

Text Config is King

When was the last time you had to convince an IT admin to update a source code file? Yeah, good luck with that.

By separate settings from code you gain a lot of flexibility and safety. It’s the difference between “Mr. CEO, you need the default from email changed? So for our Django docker deploy we’ll need to make the change in code, commit it to the feature branch, PR it to develop, pray we don’t have other changes and require a hotfix for this instead, then PR again for release to master, wait for the CI/CD to finish and then the auto-deployment should be good by end of the day.” vs. “One sec, let me update the config file and restart it. Okay, good to go.”

“Code based setting files are more powerful!” – Old Sith Proverb

That’s the path to the dark side. If you think you need programmatic help in a configuration file, your logic is in the wrong place. A config file shouldn’t validate itself, not have if statements, nor have to find the absolute path to the relative directory structured the user entered.

The best configuration handling I have seen are text files that are read, sanity checked, loaded into memory and have paths extrapolated as needed, run any necessary safety checks, then any additional logic run. This allows almost anyone to update a setting safely and quickly.

The scary Django way

Lets take a chunk from the “one django settings.py file to rule them all”

import os 
import json 

DEBUG = os.environ.get("APP_DEBUG") == "1"

ALLOWED_HOSTS = os.environ \
  .get("APP_ALLOWED_HOSTS", "localhost") \
  .split(",")

DEFAULT_DATABASES = {...}
DATABASES = (
  json.loads(os.environ["APP_DATABASES"])
  if "APP_DATABASES" in os.environ
  else DEFAULT_DATABASES
)

The basic design is to allow for overrides on runtime based on environment variables, which is very common and good practice. However, the way this example has been implemented honestly scares me. If the json loads fails, you are going to have an exception raised, in your freaking settings file. Which may even happen before logging is setup, project dependent.

Second, say you have “APP_ALLOWED_HOSTS” (or worse “APP_DATABASES”) set to an empty string accidentally. Then the “ALLOWED_HOSTS” would be set to [""] !

Finally spending all that work to grab environment variables makes it very unclear where to actually update the settings. For example, to update the ALLOWED_HOSTS you would have to change the default in the os.environ.get which wouldn’t make much sense to someone who didn’t know Python.

The safe way

Why not a much more easily readable file, in this case a YAML file paired with python-box (lots of other options as well, this was simply fast to write and easy to digest.)

# config.yaml
---
debug: false
allowed_hosts:
  - 'localhost'
databases:
  - 'sqlite3'

Wonderful, now we have a super easy to digest config file that would be easy to update. Let’s implement the parser for it.

import os
from box import Box

config = Box.from_yaml(filename="config.yaml")

config.debug |= os.getenv("DEBUG") == "1"

# Doing it this way will make sure it exists and is not empty
if os.getenv("ALLOWED_HOSTS"):
    # This is overwritten like in settings.py above instead of extended 
    config.allowed_hosts = os.getenv("ALLOWED_HOSTS").split(",")

if os.getenv("APP_DATABASES"):
    try:
        databases = Box.from_json(filename=os.environ["APP_DATABASES"])
    except Exception as err:
        # Can add proper logging or print message as well / instead of exception
        raise Exception(f"Could not load database set from {os.environ['APP_DATABASES']}: {err}")
    else:
        config.databases.extend(databases)

Tada! Now this is a lot easier to understand the logic that is happening, much easier to update the config file, as well as a crazy amount safer.

Now the only thing to argue about with your coworkers is what type of config file to use. cfg is old-school, json doesn’t support comments, yaml is slow, toml is fake ini, hocon is overcomplicated, and you should probably just stop coding if you’re even considering xml. But no matter which you go with, they are all better (except xml) than using a language based settings file.

Wrap Up

So you’ve heard me prattle on and on of the dangers Djangoers might bring with them to your next project, but also keep in mind the good things they will bring. Django helps people learn how to document well, keep a rigid project structure and has a very warm and supportive community that we should all strive to match.

Thanks for reading, and I hope you enjoyed and found this informative or at least amusing!