python 3

Let it Djan-go!

https://www.claragriffith.com/
Artwork by Clara Griffith

Do you know someone who hails from the strange land of Django development? Where there are code based settings files, is customary to use a ORM for all database calls, and classes reign supreme. It’s a dangerous place that sits perilously close to the Javalands. Most who originally traveled there were told there would be riches, but they lost themselves along the way and simply need our help to come home again.

Classless classes

The first bad habit that must be broken is the horrendous overuse of classes. While classes are considered the building blocks of most object oriented languages, Python has the power of first-class functions and a clean namespacing system which negates many the usual reasons for classes.

When using Django’s framework, where a lot of its powerful operations are performed by subclassing, classes actually makes sense. When creating a small class for a one use instance in a library or script it’s most likely simply over complicating and slowing down your code.

Let’s use an example I have seen recently, where someone wanted to run a command, but needed to know if a custom shell wrapper existed before running the command.

import subprocess
import os

class DoIt:

    def __init__(self, command):
        self.command = command
        self.shell = "/bin/secure_shell" if os.path.exists("/bin/secure_shell") else "/bin/bash"

    def execute(self):
        return subprocess.run([self.shell, "-c", self.command], stdout=subprocess.PIPE)

runner = DoIt('echo "order 66" ')
runner.execute() 

Most of this class is setup and unnecessary. Especially if the state of the system won’t change while the script is running (i.e. if “/bin/secure_shell” is added or removed during execution). Let’s clean this up a little.

import subprocess
import os

shell = "/bin/secure_shell" if os.path.exists("/bin/secure_shell") else "/bin/bash"


def do_it(command):
    return subprocess.run([shell, "-c", command], stdout=subprocess.PIPE)

do_it('echo "order 66" ')

Shorter, cleaner, faster, easier to read. It doesn’t get much better than that. Now if the system state may change, you would need to throw the shell check inside the command itself, but is also possible.

To learn more view the great PyCon video entitled “stop writing classes.” The video goes over more details of why classes can stink up your code.

Help me ORM-Kenobi!

Now onto probably the hardest change that will have the most significant impact. If you or someone you know has mesothelioma learned how to interact databases by using an ORM (Django’s Models) and don’t know how to hand write SQL queries, it’s time to make an effort to learn.

An ORM is always going to be slower than its raw SQL equivalent, and in a many cases it will be much much slower. ORMs are brilliant, but still can’t always optimize queries. Even if they do, they then load the retrieved data into Python objects which takes longer and uses more memory. Their abstraction also comes at the price of not always being clear on what is actually happening under the hood.

I am not suggesting never using an ORM. They have a lot of benefits for simple datasets such as ease of development, cross database compatibility, custom field transforms and so on. However if you don’t know SQL to begin with, your ORM will most likely be extremely inefficient, under-powered and the SQL tables will be non-normalized.

For example, one of the worst sins in SQL is to use SELECT * in production. Which is the default behavior for ORM queries. In fact, the Django main query page doesn’t even go over how to limit columns returned!

Now, I am not going to tell when to use an ORM or not, or link to a dozen other articles that fight over this point further. All I want is for everyone that thinks they should use an ORM to at least sit down and learn a moderate amount of SQL before making that decision.

Text Config is King

When was the last time you had to convince an IT admin to update a source code file? Yeah, good luck with that.

By separate settings from code you gain a lot of flexibility and safety. It’s the difference between “Mr. CEO, you need the default from email changed? So for our Django docker deploy we’ll need to make the change in code, commit it to the feature branch, PR it to develop, pray we don’t have other changes and require a hotfix for this instead, then PR again for release to master, wait for the CI/CD to finish and then the auto-deployment should be good by end of the day.” vs. “One sec, let me update the config file and restart it. Okay, good to go.”

“Code based setting files are more powerful!” – Old Sith Proverb

That’s the path to the dark side. If you think you need programmatic help in a configuration file, your logic is in the wrong place. A config file shouldn’t validate itself, not have if statements, nor have to find the absolute path to the relative directory structured the user entered.

The best configuration handling I have seen are text files that are read, sanity checked, loaded into memory and have paths extrapolated as needed, run any necessary safety checks, then any additional logic run. This allows almost anyone to update a setting safely and quickly.

The scary Django way

Lets take a chunk from the “one django settings.py file to rule them all”

import os 
import json 

DEBUG = os.environ.get("APP_DEBUG") == "1"

ALLOWED_HOSTS = os.environ \
  .get("APP_ALLOWED_HOSTS", "localhost") \
  .split(",")

DEFAULT_DATABASES = {...}
DATABASES = (
  json.loads(os.environ["APP_DATABASES"])
  if "APP_DATABASES" in os.environ
  else DEFAULT_DATABASES
)

The basic design is to allow for overrides on runtime based on environment variables, which is very common and good practice. However, the way this example has been implemented honestly scares me. If the json loads fails, you are going to have an exception raised, in your freaking settings file. Which may even happen before logging is setup, project dependent.

Second, say you have “APP_ALLOWED_HOSTS” (or worse “APP_DATABASES”) set to an empty string accidentally. Then the “ALLOWED_HOSTS” would be set to [""] !

Finally spending all that work to grab environment variables makes it very unclear where to actually update the settings. For example, to update the ALLOWED_HOSTS you would have to change the default in the os.environ.get which wouldn’t make much sense to someone who didn’t know Python.

The safe way

Why not a much more easily readable file, in this case a YAML file paired with python-box (lots of other options as well, this was simply fast to write and easy to digest.)

# config.yaml
---
debug: false
allowed_hosts:
  - 'localhost'
databases:
  - 'sqlite3'

Wonderful, now we have a super easy to digest config file that would be easy to update. Let’s implement the parser for it.

import os
from box import Box

config = Box.from_yaml(filename="config.yaml")

config.debug |= os.getenv("DEBUG") == "1"

# Doing it this way will make sure it exists and is not empty
if os.getenv("ALLOWED_HOSTS"):
    # This is overwritten like in settings.py above instead of extended 
    config.allowed_hosts = os.getenv("ALLOWED_HOSTS").split(",")

if os.getenv("APP_DATABASES"):
    try:
        databases = Box.from_json(filename=os.environ["APP_DATABASES"])
    except Exception as err:
        # Can add proper logging or print message as well / instead of exception
        raise Exception(f"Could not load database set from {os.environ['APP_DATABASES']}: {err}")
    else:
        config.databases.extend(databases)

Tada! Now this is a lot easier to understand the logic that is happening, much easier to update the config file, as well as a crazy amount safer.

Now the only thing to argue about with your coworkers is what type of config file to use. cfg is old-school, json doesn’t support comments, yaml is slow, toml is fake ini, hocon is overcomplicated, and you should probably just stop coding if you’re even considering xml. But no matter which you go with, they are all better (except xml) than using a language based settings file.

Wrap Up

So you’ve heard me prattle on and on of the dangers Djangoers might bring with them to your next project, but also keep in mind the good things they will bring. Django helps people learn how to document well, keep a rigid project structure and has a very warm and supportive community that we should all strive to match.

Thanks for reading, and I hope you enjoyed and found this informative or at least amusing!

Paint, Paper, Panoramas, and Python

I’m an artist and a python developer, two things that rarely occupy the same worlds, let alone the same sentence. However, I have recently found a way to combine these two passions: Panoramas.

My current smartphone takes excellent pictures. It does a great job at figuring out colors, lighting, and focus, even in low lighting. As an artist, this is important to me because I often use my phone to snap quick pictures of a scene as a reference to take back to my studio. It’s a huge improvement in the technology I had in my hands even five years ago. There is one thing about my old phone that I miss though – its ‘panorama’ photo mode, but not because it was better.

I miss how amazingly awful it used to be, and more importantly, the freedom to make awful pictures it allowed. I’d point the lens out the window of the car as it sped along (as a passenger of course) to make jagged and confusing images of tiny bits of the landscape that the phone struggled to hodgepodge together. I’d tilt and move the phone in random directions to make weird swirls of the horizon. Even when being used ‘as directed’, it would usually struggle with focus and lighting coming up with spontaneously and wonderfully terrible photos with abstract light glare or menacing dark patches. It’s hard to explain, but sometimes as an artist, a terrible photo can be just as inspiring as those picture perfect reference pics I take with me back to the studio.

My current phone is too smart for that though, and it snatches away any joy of bad photography by making consistently beautiful and seamless panoramas. Not only that, but it accomplishes this mostly by yelling at you (“You’re going too fast!”) or by using angry arrows to make sure you can only move the phone in one direction, and then abruptly ending the photograph when you don’t cooperate. So, I did what anyone does when they get nostalgic for awful photography – I made a python script to make my own terrible panoramas.

My plan was simple. First, I would shoot short videos where my phone wouldn’t yell at me for moving, tilting, and spinning the image as much as I wanted. Next, use Python to convert each frame of the video clip to an image, crop the image into a tiny sliver out of the center of the image and then glue them all together. The results are imperfect. And gloriously so.

Side note: Although I used my smartphone to shoot some video, this script could be applied to any video. Think of the wild panoramas you could create from some Russian dash cam footage, or a GoPro strapped to a fish, or a tiny clip from the Lord of the Rings. However, this script works best on videos that are less than 10 seconds long or else it produces mile long panoramic images. Currently, I don’t bother limiting the image size at all, but theoretically I could by using one out of every five frames for instance, or by cutting down the image slice size based on video length.

The Python

I used ffmpeg for turning each video frame into an image. It was simple to install, just download and unzip. Here’s a handy installation guide -> https://github.com/adaptlearning/adapt_authoring/wiki/Installing-FFmpeg

The Python Image Library is the only other requirement, installed with pip.

pip3 install pillow

The script works by pulling all videos out of a source directory based on file suffix and creating a panorama for each. This could easily be modified to convert just one video at a time by removing the loops and passing the path to the desired video directly to ffmpeg.


directory = Path('my\\videos\\dir')

vids = []
for vid in directory.iterdir():
    if vid.suffix.lower() in ('.mp4', '.mkv'):
        vids.append(vid)

Every frame pulled out by ffmpeg is stored in a file. I delete the directory and recreate it before ffmpeg runs to delete the old frames from the last run.


for vid in vids:
    shutil.rmtree("pics", ignore_errors=True)
    os.makedirs("pics", exist_ok=True)

    print(f'Creating panoramic {vid.stem}')
    result = run(
        f'ffmpeg -i {vid.absolute()} '
        f'-y pics\\thumb%04d.jpg -hide_banner', 
        shell=True, stderr=PIPE)
    result.check_returncode()
    print(result.stderr.decode('utf-8'))

After it finishes pulling out all the frames, I start the panorama by creating an empty image. I need to have the dimensions of the finished image to create it. To get the final width, I multiply the number of frames ffmpeg pulled out by the width of my image slice (40 pixels). For the height, I open up one of the frames and use it size as a reference. I also use the sample image’s dimensions to figure out the center of the image for cropping everything down later.

Then, I loop through all the frame images in reverse order (because … long story short, it usually looks better that way) and then work on slicing each image down to 40 pixels wide to glue into the panorama.

    
    sample = Image.open("pics/{}".format(os.listdir("pics")[0]))
    width, height = sample.size
    center = width / 2

    panoramic = Image.new('RGB', (len(os.listdir("pics")*40), height))
    
    # This offset is so PIL knows where to start adding 
    # each image slice to the panorama
    x_offset = 0

    for i in reversed(os.listdir("pics")):
        img = Image.open("pics/{}".format(i))
        area = (center - 20, 0, center + 20, height)
        cropped_img = img.crop(area)
        panoramic.paste(cropped_img, (x_offset, 0))
        x_offset += 40

    panoramic.save(f'{vid.stem}.jpeg')
    panoramic.close()

The Painting

So far, I am quite happy with the results of this adorable little script.
It has definitely given me the creative inspiration I was missing. In the past two weeks, I have done three series of paintings based on panoramas I have created using it, with plans for many more. Here’s an example of how I used it to create some artwork!

I took this video:

turned it into this panorama:

played with some paint and smudged around some charcoal and pastels:

and came up with this:

Final Thoughts

It feels really amazing to apply Python to unusual problems, even if that challenge is finding a unique way of creating original art. Plus, if the inspiration ever dries up, I have some ideas for making this script even more fun:

  • grab each slice from a random spot rather than dead center of the image for something much more jumbled and abstract
  • options to not use every frame for longer videos
  • PIL ‘effects’, like a black and white mode, over saturation, or extra blurry images
  • an ‘up and down’ mode for tall panoramas

I hope you enjoyed! Feel free to check out my website or my instagram for more artwork if you are interested.

Is the world ready for Python 3?

The trek from Python 2 to Python 3 has been drawn-out, arduous and fraught with perils. How close are our dear Knights developers to all reaching the long sought glory of Python 3?

Quest for the Python 3 – Artwork by Clara Griffith (Link may contain NSFW art)

PIP Downloads

Let’s first jump into what is being used the most currently. This data examines fifteen different libraries downloaded via PIP for a particular Python version. We are only including 2.7 and 3.4+, the Python Versions that are currently supported.

The libraries analyzed are ones that have over 10K stars on github and have been downloaded via PIP. The contenders are: celery, django, flask, ipython, keras, mitmproxy, numpy, pandas, python-box, requests, scrapy, selenium, tensorflow, and tornado. (To be fair, numpy and python-box didn’t have 10K stars, but I used them in the script to make these graphics, so gave them some spotlight too.)

As of January 2019, Python 3 downloads are eclipsing Python 2 by over 20% with Python 3.6 bringing over 39% of it, almost directly matching Python 2.7’s total.

That is good, but not great news. Thankfully Python 2 won’t just stop working at the end of this year, but those are rookie Python 3 numbers, we got to pump them up!

Of course, we have to remember this is a small subset of all downloads. Subsequently, pip downloads themselves don’t tell the whole tale, but this does give us an idea of how things of are going.

This is accomplished by using the PyPI BigQuery data and some SQL (adapted from Artem Golubin’s post about this from last year), then throwing it into matplotlib.

SELECT
  SUBSTR(details.python, 0, 3) as python_version,
  COUNT(*) as download_count
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -30, "DAY"),
    CURRENT_TIMESTAMP()
  )
WHERE
 details.installer.name='pip' and
 file.project = 'requests' -- change project name here
GROUP BY
  python_version
ORDER BY
  download_count DESC
LIMIT 100

Library Brawl: Who’s the Python 3 champs?

In this head to head, we are going to compare two similar libraries, and see how they are doing on the switch to Python 3.

Web Frameworks

The first two up are very popular web frameworks to develop in, Flask and Django.

It’s a dead heat! Both libraries are doing well at attracting developers with a fresh mindset.

Machine Learning

The most popular github package by far was tensorflow with over a hundred thousand stars. Here it’s paired against it’s younger brother keras, which actually depends on it (or other AI tools) to operate.

Machine learning needs to teach it’s developers how to update! It’s a sad day for AI.

Hacker vs Web Scraper

Okay, not really directly comparable tools with a man-in-the-middle proxy and a web scraper, but it’s still an interesting match up.

With this duo I was surprised they didn’t have a higher correlation. I was honestly expecting the mitm tool to have less Python 3 love, as a lot of “hacker” tools depend on the broken way Python 2 handles strings vs unicode, thus are hard to update.

Good job hackers, always keep your tool belt fresh! Scrapers….scrape it together.

Data Science

The last head to head is for the data scientists out there, and you got science in your name and numbers in your veins, you should be at the bleeding edge of tech!

Ouch, yinz need to get with the times.

Python Version Developers Use More Often

This is some hard to gather data as an individual, so I’m going to have to cheat and just base this information off JetBrain’s yearly state of the ecosystem reports from 2017 and 2018.

In 2017, 53% of devs reported using Python 3 as their main language, which went up 22% in 2018 to 75%. Based on those two points of data, we can come to a crystal clear, no doubt conclusion to how many developers will be using Python 3 as their main language in 2019.

That’s right, based on the past two year trend, 97% of developers should be using Python 3 in 2019.

Okay, well, maybe not. But I personal expect that number to be over 90% by the time Python 2 is EOL, which is excellent news.

Operating System Default Language

OSes have a fun time of being in the cross hairs of everyone from desktop to server users, trying to figure out the right combo of what’s best for their users and for their own technology stack going forward. Every major Linux distribution agrees Python 3 is the way to the future and they will need to change over. The hard part is deciding when it will impact the users least and best for their own release cycle. This has caused lots of headaches over the years. So where do we stand now?

OSPython Version
Windows 10None
OSX 10.82.7
Debian 92.7
RedHat 8*3.6
Fedora 293.7
Ubuntu 19.04*3.7

(* denotes upcoming releases this year)

Windows has the easy stance of just saying “do it yourself” and Mac is, as usual, not bothering to innovate and just hum along until it breaks. Thankfully most Linux distros, which power the internet, are either already updated or updating this year. I haven’t seen for sure that Debian 10 will be released with Python 3 or that it w ill be out before year’s end, but I would be surprised if either were not true. Then there’s Arch linux. Arch has had Python 3 as the standard for almost as long as it existed, good boy!

Are we ready?

In all honesty, we are. We are far more prepared for this than the financial sector was ready for Y2K, and we all survived that. Moreover, there are always going to be code bases that can’t update to the latest version easily, but that’s true across the entire software development world. That and the fact the Python Software Foundation has given an extended eleven years which has allowed for even the slowest of companies to have ample time to migrate to Python 3.

Python 3 everywhere? Bring it on!