Overview

A broad introduction and examples of a module, library or program

Cartoon or Photo? – Image detection with Python

This all started with me just wanting a fast way to sort through image folders and remove cartoon images. That lead me down a spiraling rabbit hole of possibility. From using OpenCV to do different types of detection, or even training a Machine Learning model from scratch with Keras. This article will go through the multiple options available and learn how accurate they end up being.

What makes an image a cartoon?

Model is Jessica Marie Frye

If we are going to detect the differences between photos and cartoons, first we need to figure out how they are different. Importantly, how they are different in a quantifiable way. These are the measurable differences I could think up:

  • Cartoons have smoother gradients
  • Real images use a larger color palette
  • Cartoons usually have drawn edge outlines

Below we will try each of these options and see how well they fare.

Results First

You’re probably more interested in what will work the best for you and less about how I spent days toiling away at this, so to cut to the chase, here is how everything performed.

The best OpenCV contender turned out to be counting colors, with a combined 75% accuracy.

Overall, Machine Learning Image Classification using the Xception model wiped the floor with a combined 96% accuracy. This probably isn’t even as good as it could get with further fine tuning, but was more than good enough for my own needs.

These results were also with a hard threshold set to force it into either bucket of real or cartoon. I personally modify them for my own use to use a smaller range for absolutes, and then put the “unsure” ones in another folder. Further reducing any errant picks.

Machine Learning with Keras is the obvious pick if you have a good set of data to train with and a computer beefy enough to process it. However it’s not as portable and much longer than simply trying out one of the OpenCV methods.

OpenCV Gradient Differences

The first method we will use is pretty straight forward. We will blur the image a little, and compare it to it’s unaltered form and quantify the difference. First things first is that we will need opencv installed for python. Go into your venv for this and run:

pip install opencv-python

Now open up your IDE and create a new .py file to get started. First thing we are going to do is read the image into opencv which is the cv2 module. We will use a JPEG image as it will be the expected BRG color format.

import cv2

img = cv2.imread("/path/to/my/image.jpg")

Next we will blur the image a little using a bilateral filter, to even out the colors. We will also resize the image to a standard size so the blur across every image is the same.

img = cv2.resize(img, (1024, 1024))
color_blurred = cv2.bilateralFilter(img, 6, 250, 250)

You can check out the result to see how strong the effect is by previewing the image. Press any key to close the window.

# Optional Preview
cv2.imshow("blurred", color_blurred)
cv2.waitKey(0)
cv2.destroyAllWindows()

Then we need to compare this new color_blurred image to the original image. I accomplished this by comparing the histograms. We will have do that for each color individually.

diffs = []
for k, color in enumerate(('b', 'r', 'g')):
    print(f"Comparing histogram for color {color}")
    real_histogram = cv2.calcHist(img, [k], None, [256], [0, 256])
    color_histogram = cv2.calcHist(color_blurred, [k], None, [256], [0, 256])
    diffs.append(cv2.compareHist(real_histogram, color_histogram, cv2.HISTCMP_CORREL))

result = sum(diffs) / 3

compareHist will give us a result between 0 and 1 (one being the most similar.) We will need to set a threshold for how similar we cartoons will be. I have mine set at 0.98 (aka 98% similar.)

if result > 0.98:
    print("It's a cartoon!")
else: 
    print("It's a photo!")

And that’s it! Now you can test it out and see how it works for your images. I found this iteration to work very fast and have about a ~70% proper detection rate.

Let’s put it all together into a usable script!

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from pathlib import Path
from typing import Union
import argparse

import cv2


def is_cartoon(
    image: Union[str, Path],
    threshold: float = 0.98,
    preview: bool = False,
) -> bool:
    # read and resize image
    img = cv2.imread(str(image))
    img = cv2.resize(img, (1024, 1024))

    # blur the image to "even out" the colors
    color_blurred = cv2.bilateralFilter(img, 6, 250, 250)

    if preview:
        cv2.imshow("blurred", color_blurred)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    # compare the colors from the original image to blurred one.
    diffs = []
    for k, color in enumerate(("b", "r", "g")):
        # print(f"Comparing histogram for color {color}")
        real_histogram = cv2.calcHist(img, [k], None, [256], [0, 256])
        color_histogram = cv2.calcHist(color_blurred, [k], None, [256], [0, 256])
        diffs.append(
            cv2.compareHist(real_histogram, color_histogram, cv2.HISTCMP_CORREL)
        )

    return sum(diffs) / 3 > threshold


def command_line_options():
    args = argparse.ArgumentParser(
        "blur_compare",
        description="Determine if a image is likely a cartoon or photo.",
    )
    args.add_argument(
        "-p",
        "--preview",
        action="store_true",
        help="Show the blurred image",
    )
    args.add_argument(
        "-t",
        "--threshold",
        type=float,
        help="Cutoff threshold",
        default=0.98,
    )
    args.add_argument(
        "image",
        type=Path,
        help="Path to image file",
    )
    return vars(args.parse_args())


if __name__ == "__main__":
    options = command_line_options()
    if not options["image"].exists():
        raise FileNotFoundError(f"No image exists at {options['image'].absolute()}")
    if is_cartoon(**options):
        print(f"{options['image'].name} is a cartoon!")
    else:
        print(f"{options['image'].name} is a photo!")

OpenCV Color Counting

Using a subset of 512 colors in a 1024×1024 image, determine how much of the image can be reproduced

Next possible way to approach the problem was just figuring out how many colors were used for the majority of the image. We will use all the same code to load and resize the image from above, just this time we will add a loop to count all the colors (slow way, faster option below):

    # Find count of each color
    a = {}
    for item in img.flatten():
        value = tuple(item)
        if value not in a:
            a[value] = 1
        else:
            a[value] += 1

Next we will sort the dictionary by the most used color, and add the count of the top 512 images together.

The actual calculation is a lot of functionality in a small block of code. First let’s get a visual preview of what is happening by re-creating the image with only the selected colors with:

mask = numpy.zeros(img.shape[:2], dtype=bool)

for color, _ in sorted(a.items(), key=lambda pair: pair[1], reverse=True)[:512]:
    mask |= (img == color).all(-1)

img[~mask] = (255, 255, 255)

cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Here’s the code that basically calculates what you are seeing. We divide the sum of all the top colors by the size of the image to get what percent could be recreated with just those colors.

    # Identify the percent of the image that uses the top 512 colors
    most_common_colors = sum([x[1] for x in sorted(a.items(), key=lambda pair: pair[1], reverse=True)[:512]])
    return (most_common_colors / (1024 * 1024)) > 0.3 # new threshold

This script is pretty similar to the last one. And has a slightly higher success rate on my data set at 76%! It is 20% better at detecting what is a cartoon, but 10% worse at photos. It is also much much slower, some 20~50 times slower. (Will take a second per image instead of 0.05 of a second). However, we can speed that up by instead using some numpy trickery.

    # Replace everything after "Find count of each color" with this faster version, but it doesn't work with the preview.
    flattened = numpy.reshape(img, ((1024 * 1024), 3))
    multiplied = numpy.multiply(flattened, [100_000, 100, 1])
    sums = multiplied.sum(axis=1)
    unique, counts = numpy.unique(sums, return_counts=True)

    # Identify the percent of the image that uses the top 512 colors
    most_common_colors = sum(sorted(counts, reverse=True)[:512])
    return (most_common_colors / (1024 * 1024)) > threshold

Here is the entire script with the slower version that works with preview image. However after getting a good threshold I would suggest replacing the code with the faster version above.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from pathlib import Path
from typing import Union
import argparse

import cv2
import numpy


def is_cartoon(
    image: Union[str, Path],
    threshold: float = 0.3,
    preview: bool = False,
) -> bool:
    # read and resize image
    img = cv2.imread(str(image))
    img = cv2.resize(img, (1024, 1024))

    # Find count of each color
    a = {}
    for row in img:
        for item in row:
            value = tuple(item)
            if value not in a:
                a[value] = 1
            else:
                a[value] += 1

    if preview:
        mask = numpy.zeros(img.shape[:2], dtype=bool)

        for color, _ in sorted(a.items(), key=lambda pair: pair[1], reverse=True)[:512]:
            mask |= (img == color).all(-1)

        img[~mask] = (255, 255, 255)

        cv2.imshow("img", img)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    # Identify the percent of the image that uses the top 512 colors
    most_common_colors = sum(
        [x[1] for x in sorted(a.items(), key=lambda pair: pair[1], reverse=True)[:512]]
    )
    return (most_common_colors / (1024 * 1024)) > threshold


def command_line_options():
    args = argparse.ArgumentParser(
        "blur_compare",
        description="Determine if a image is likely a cartoon or photo.",
    )
    args.add_argument(
        "-p",
        "--preview",
        action="store_true",
        help="Show the blurred image",
    )
    args.add_argument(
        "-t",
        "--threshold",
        type=float,
        help="Cutoff threshold",
        default=0.3,
    )
    args.add_argument(
        "image",
        type=Path,
        help="Path to image file",
    )
    return vars(args.parse_args())


if __name__ == "__main__":
    options = command_line_options()
    if not options["image"].exists():
        raise FileNotFoundError(f"No image exists at {options['image'].absolute()}")
    if is_cartoon(**options):
        print(f"{options['image'].name} is a cartoon!")
    else:
        print(f"{options['image'].name} is a photo!")

OpenCV Edge Detection

This I haven’t had much luck with, only about 55% successful at determine what type of image it is. A coin toss is about as accurate. I don’t recommend using it as is, but if you have any ideas or improvements, let me hear them!

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from pathlib import Path
from typing import Union
import argparse

import cv2
import numpy


def is_cartoon(
    image: Union[str, Path],
    threshold: float = 4500,
    preview: bool = False,
) -> bool:
    # read and resize image
    img = cv2.imread(str(image))
    img = cv2.resize(img, (1024, 1024))

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred_gray = cv2.medianBlur(gray, 3)

    edges = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 5, 10
    )
    blurred_edges = cv2.adaptiveThreshold(
        blurred_gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 5, 10
    )

    if preview:
        cv2.imshow("edges", edges)
        cv2.imshow("blurred edges", blurred_edges)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    count_1 = numpy.count_nonzero(edges)
    count_2 = numpy.count_nonzero(blurred_edges)

    return abs(count_2 - count_1) < threshold


def command_line_options():
    args = argparse.ArgumentParser(
        "blur_compare",
        description="Determine if a image is likely a cartoon or photo.",
    )
    args.add_argument(
        "-p",
        "--preview",
        action="store_true",
        help="Show the blurred image",
    )
    args.add_argument(
        "-t",
        "--threshold",
        type=float,
        help="Cutoff threshold",
        default=4500,
    )
    args.add_argument(
        "image",
        type=Path,
        help="Path to image file",
    )
    return vars(args.parse_args())


if __name__ == "__main__":
    options = command_line_options()
    if not options["image"].exists():
        raise FileNotFoundError(f"No image exists at {options['image'].absolute()}")
    if is_cartoon(**options):
        print(f"{options['image'].name} is a cartoon!")
    else:
        print(f"{options['image'].name} is a photo!")

Keras Machine Learning Model

Time to take the kid gloves off, let’s use some ML image detection.

First we have to train a minified Xception model with a lot of hand picked good data. I used 556 cartoon images and 2295 real images in the training datasets. Then used that model for detection between another 2000+ unsorted images.

To get set by step details, please check out the same tutorial I used for this.

import shutil

import numpy as np
import os
import time
from pathlib import Path

import tensorflow as tf

root_dir = Path("/training/")
image_size = (180, 180)
batch_size = 32
epochs = 20
model_name = "my_model"


def make_model(input_shape, num_classes):
    inputs = tf.keras.Input(shape=input_shape)
    # Image augmentation block
    data_augmentation = tf.keras.Sequential(
        [
            tf.keras.layers.RandomFlip("horizontal"),
            tf.keras.layers.RandomRotation(0.1),
        ]
    )
    x = data_augmentation(inputs)

    # Entry block
    x = tf.keras.layers.Rescaling(1.0 / 255)(x)
    x = tf.keras.layers.Conv2D(32, 3, strides=2, padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)

    x = tf.keras.layers.Conv2D(64, 3, padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)

    previous_block_activation = x  # Set aside residual

    for size in [128, 256, 512, 728]:
        x = tf.keras.layers.Activation("relu")(x)
        x = tf.keras.layers.SeparableConv2D(size, 3, padding="same")(x)
        x = tf.keras.layers.BatchNormalization()(x)

        x = tf.keras.layers.Activation("relu")(x)
        x = tf.keras.layers.SeparableConv2D(size, 3, padding="same")(x)
        x = tf.keras.layers.BatchNormalization()(x)

        x = tf.keras.layers.MaxPooling2D(3, strides=2, padding="same")(x)

        # Project residual
        residual = tf.keras.layers.Conv2D(size, 1, strides=2, padding="same")(
            previous_block_activation
        )
        x = tf.keras.layers.add([x, residual])  # Add back residual
        previous_block_activation = x  # Set aside next residual

    x = tf.keras.layers.SeparableConv2D(1024, 3, padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)

    x = tf.keras.layers.GlobalAveragePooling2D()(x)
    if num_classes == 2:
        activation = "sigmoid"
        units = 1
    else:
        activation = "softmax"
        units = num_classes

    x = tf.keras.layers.Dropout(0.5)(x)
    outputs = tf.keras.layers.Dense(units, activation=activation)(x)
    return tf.keras.Model(inputs, outputs)


def train_data():
    train_ds = tf.keras.preprocessing.image_dataset_from_directory(
        str(root_dir),
        validation_split=0.2,
        subset="training",
        seed=1337,
        image_size=image_size,
        batch_size=batch_size,
    )
    val_ds = tf.keras.preprocessing.image_dataset_from_directory(
        str(root_dir),
        validation_split=0.2,
        subset="validation",
        seed=1337,
        image_size=image_size,
        batch_size=batch_size,
    )

    train_ds = train_ds.prefetch(buffer_size=32)
    val_ds = val_ds.prefetch(buffer_size=32)

    model = make_model(input_shape=image_size + (3,), num_classes=2)
    tf.keras.utils.plot_model(model, show_shapes=True)

    callbacks = [
        tf.keras.callbacks.ModelCheckpoint(f"{model_name}_{{epoch}}.h5"),
    ]
    model.compile(
        optimizer=tf.keras.optimizers.Adam(1e-3),
        loss="binary_crossentropy",
        metrics=["accuracy"],
    )
    model.fit(
        train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds,
    )


def clean_images():
    move_to = root_dir.parent / "bad_data"  # needs to be not in the training directory
    move_to.mkdir(exist_ok=True)
    moved = 0
    for directory in root_dir.glob("*"):
        if directory.name == move_to.name:
            continue
        if directory.is_dir():
            for i, file in enumerate(directory.glob("*")):
                if not file.name.lower().endswith(("jpg", "jpeg")) or not tf.compat.as_bytes("JFIF") in file.open("rb").read(10):
                    shutil.move(file, move_to / file.name)
                    moved += 1
        print("moved unclean data", moved, "from", directory)


def move_images():
    model = tf.keras.models.load_model(f"{model_name}_{epochs}.h5")
    cartoon_dir = Path("/cartoon/")
    real_dir = Path("/real/")

    real, cartoon, unknown = 0, 0, 0

    for file in Path("/unsorted/").glob("*.[jpg][jpeg][png]"):

        img = tf.keras.preprocessing.image.load_img(
            str(file), target_size=image_size
        )
        img_array = tf.keras.preprocessing.image.img_to_array(img)
        img_array = tf.expand_dims(img_array, 0)  # Create batch axis

        predictions = model.predict(img_array)
        score = predictions[0][0]
        if score > 0.98:
            real += 1
            shutil.move(file, real_dir / file.name)
        elif score < 0.02:
            cartoon += 1
            shutil.move(file, cartoon_dir / file.name)
        else:
            unknown += 1
            print(f"Could not figure out {file} as it was {score * 100}")
    print(f"Moved {real} to real and {cartoon} to cartoon, {unknown} were unmoved")


if __name__ == '__main__':
    clean_images()
    train_data()
    move_images()

I personally think 95% overall accuracy is amazing for no additional tuning. As well as not using this model for not exactly it’s intended use case. Generally we think about classifying items in the image (cat vs dog) not type of image (anime cat vs real cat).

Python’s New Structural Pattern Matching!

It’s finally happened, after years of complaining and told “it will never happen”, I got my darn switch statement! And honestly, calling it only a “switch” statement is really underselling it. It’s much more powerful than most people will ever use! You might have already heard about this, but I wanted to make sure it made it to at least the 3.10 beta phase, as the alphas are no guarantee you’ll actually see the new feature (still possible to remove before the RC phase, but less likely).

Artwork by Clara Griffith

Python 3.10 will introduce “Structural Pattern Matching” as introduced in PEP622 which is a crazy advanced switch statement that can recognizing patterns. Instead of the keyword switch Python will introduce match instead (get ready to update your regex variable names!). Let’s do a little quick compare with it and JavaScript’s switch statement. First let’s start off with the Javascript example modified from w3schools.

Javascript’s Switch

switch (new Date().getDay()) {
  case 5:
    console.log("It's Friday!");
    break;
  case 0:
  case 6:
    console.log("Woo, Weekend!");
    break;
  default:
    console.log("Work. Work. Work. Work.");
}

Python’s Structural Pattern Matching

We can accomplish the same with Python in less lines. Note that in Python the weekday starts at 0 for Monday.

from datetime import datetime

match datetime.today().weekday():  
    case 4:
        print("It's Friday!")
    case 5 | 6:
        print("Woo, Weekend!")
    case _:
        print("Work. Work. Work. Work.")

Similarities and Differences

With Python’s new match style statements, the biggest change from tradition is no “drop through” cases. Aka you don’t have to manually specify break or anything to exit the match case for every instance. That does mean though that you can’t have multiple matches with the same result. Instead you can combine checks with | for “or” operations. Which I honestly think is a lot cleaner.

What I don’t like is there is no default keyword or similar, and you have to use the _ wildcard. The underscore is already overused in my opinion. However it does add a lot of power for more advanced cases. Let’s dive into some more examples.

Matching and Assigning

The very basic things to understand about the new Structural Pattern Matching is the flow (as seen in the example above) and the possibility of assignment. Because not only can you make sure things are equal to another literal, you can detect patterns (as the name suggests) and then work with the variables inside the pattern itself.

Pattern Recognition

Lets dip the toes in on this whole “pattern matching” thing, what does that mean?

Say you have a tuple with two objects, that could be in either order and you always want to print it right.

data_1 = ("Purchase Number", 574)
data_2 = (574, "Purchase Number")

You could have a simple match case to straighten it out so it always prints “Purchase Number 574”. As the pattern matching support type checking.

match data_1:
    case str(x), int(y):
        print(x, y) 
    case int(x), str(y):
        print(y, x)

Match on dict values

Just like with regular value comparisons, you can check dictionary values and still do or operations. In this case, either grade b or c will print out "Welcome to being average!"

match {'grade': 'B'}:
    case {'grade': 'a' | 'A'}:
        print("You're a star!")
    case {'grade': 'b' | 'c' | 'B' | 'C'}:
        print("Welcome to being average!")

Unpacking arbitrary data

You can use the standard * and ** unpackers to capture and unpack variable length lists and dicts.

my_dict = {'data': [{'action': 'move'}, {'action': 'stay'}],
           'location': "square_10"}

match my_dict:
    case {'data': [*options], **remainder }:
        print(options)
        print(remainder)
    case _:
        raise ValueError("Data not as expected")

Will match with the first case and print out:

[{'action': 'move'}, {'action': 'stay'}]
{'location': 'square_10'}

That can really come in handy if dealing with variable length data.

Adding the “if”

You can also add in some inline checks.

cords = (30.25100365332043, -97.86221371827051, "Workplace")
cords_2 = (44.40575746530253, 8.947747627912994)

match cords: 
    case lat, lon, *_ if lat > 0:
        print("Northern Hemisphere")
    case lat, lon, *_ if lat < 0: 
        print("Sothern Hemisphere")    

Don’t forget about “as”

Sometimes you might want one the many literals you are looking for to be usable in the case itself. Lets put a lot of our knowledge together and check out the added power of the as clause.

We are going to build a really simple command line game to look for random things.

import sys
import random

inventory = []

while True:
    match input().lower().split():
        # Remember `split` makes the input into a list.  
        # These first two cases are functionally the same.
        case ["exit"]:  
            sys.exit(0)
        case "stop", *_: 
            sys.exit(0)
        case "travel", "up" | "down" | "left" | "right" as direction:
            print(f"Going {direction}")
        case "search", "area" | "backpack" as thing, _, *extra:
            item = " ".join(extra)
            if thing == "backpack":
               if item  in inventory:
                   print(f"Yes you have {item}")
               else:
                   print(f"No you don't have {item}")
            elif thing == "area":
                 if random.randint(1, 10) > 5:
                     inventory.append(item)
                     print(f"You found {item}!")
                 else:
                    print(f"No {item} here")
            else:
                print(f"Sorry, you cannot search {thing}")
        case _:
            print("sorry, didn't understand you! type 'exit' or 'stop' to quit")

Lets do a quick playthrough:

> search area for diamonds
You found diamonds!

> search backpack for diamonds
Yes you have diamonds

> take a short rest and try to eat the diamonds
sorry, didn't understand you! type 'exit' or 'stop' to quit

> Travel Up
Going up

> search area for candy
No candy here

> search backpack for candy
No you don't have candy

> stop
# Process finished with exit code 0

Notice our special case for searching around.

case "search", "area" | "backpack" as thing, _, *extra:

we are looking for the first keyword “Search”, then either “area” or “backpack” and saving which to the variable thing. The _ will ignore the next word as we expect them to type for or something useless there. Finally we grab everything else and treat it as a single item.

Classes and More

Using dataclasses with match is super easy.

from dataclasses import dataclass

@dataclass
class Car:
    model: str
    top_speed: int

car_1 = Car("Jaguar F-Type R", 186)
car_2 = Car("Tesla Model S", 163)
car_3 = Car("BMW M4", 155)

match car_1:
    case Car(x, speed) if speed > 180:
        print(f"{x} is a super fast car!")
    case Car(x, speed) if speed > 160:
        print(f"{x} is a pretty fast car")
    case _:
        print("Regular Car")

Using a regular class, you have to be a pit more explicit about the varaibles that are being used.

class Car:

    def __init__(self, model, top_speed):
        self.model = model
        self.top_speed = top_speed


car_1 = Car("F-Type R", 186)
car_2 = Car("Model S", 163)
car_3 = Car("BMW M4", 155)

match car_1:
    case Car(model=x, top_speed=speed) if speed > 180:
        print(f"{x} is a super fast car!")
    case Car(model=x, top_speed=speed) if speed > 160:
        print(f"{x} is a pretty fast car")
    case _:
        print("Regular Car")

However, there is a way around that! You can add __match_args__ to the class itself to define which arguments you will want to use for the pattern recognition.

class Car:

    __match_args__ = ["model", "top_speed"]
    def __init__(self, model, top_speed):
        self.model = model
        self.top_speed = top_speed


car_1 = Car("F-Type R", 186)
car_2 = Car("Model S", 163)
car_3 = Car("BMW M4", 155)

match car_1:
    case Car(x, speed) if speed > 180:
        print(f"{x} is a super fast car!")
    case Car(x, speed) if speed > 160:
        print(f"{x} is a pretty fast car")
    case _:
        print("Regular Car")

Summary

Structural Pattern Matching is Python’s sexy new “switch” statement that will bring a lot of power, and removal of a lot of old if elif blocks. The only real downside is that it’s brand new, and will take a few years for most people to be able to use it with their default interpreter.

HDR, HDR10, HDR10+, HLG and Dolby Vision

I have read several articles on the differences between what all these different types of High Dynamic Range (HDR) video formats are, but few address it from a technical standpoint.

So lets do a quick high level (not so technical) overview of each:

  • WCG: Wide Color Gamut – more colors
  • HDR: Overloaded term: generally the combo of WCG + larger luminosity range
  • HDR10: single sheet of color and light information to set upper and lower bounds
  • HDR10+: a list of sheets for each scene to change those bounds dynamically
  • Dolby Vision: a proprietary list of bigger sheets
  • HLG: Regular color space with magic sprinkles on top

Now into a bit more detail. Keep in mind I will be talking about consumer consumption of these concepts, not the prosumer or creator original formats.[1]

HDR

HDR, aka High Dynamic Range has been overloaded to mean many things nowadays. Companies like to slap it on just like they do with “HD” if any of their stuff meets minimum requirements (and sometimes not.)

What HDR actually allows for is simply brighter brights, and darker darks.

Gamma Curve

The human eye is a very sensitive to small changes in darkness, and the traditional range of brightness in 8-bit video is basically a staircase, each level of light the same distance away from the one before and after it.

Whereas to capture more of the details in the darks, we would want something that has smaller steps of difference at lower light levels, but can be larger steps the brighter it gets.

That means we need two things, more room to store information (aka 10-bit videos), and some kind of maths that can condense the usual steps into a gamma curve. The standard curve generator for UHD movies is SMPTE ST 2084 but there is also HLG, discussed below.

Wide Color Gamut

Well now that we got 10-bit videos, we also have room for more colors right? The UHD Alliance thought so too, so they set the standard requirements of UHD videos to be:

  • 3840×2160 resolution
  • 10-bit minimum
  • SMPTE 2084
  • BT.2020 color gamut

So now they uses a 10-bit or 12-bit pixel format and larger color space (bt.2020) instead of the traditional 8-bit (bt.709) to encompass a much larger array of color and light depth.

Images from Wikimedia Commons[2][3]

This new standard is not directly backwards compatible, so sometimes you may come across devices that view what should be a beautifully colorful scene, as painfully muted and off color. For example, here is a 10-bit BT.2020 video as decoded proper (left) and a direct conversion to 8-bit (right).

HDR properly decoded vs direct conversion to an 8-bit video
Stills taken from Dolby’s Glass Blowing Demo[4]

This is not much of an issue for movies, as HDR is only currently on 4K Blu Rays, so the players have to support BT.2020. However this is an issue for broadcasters who need to provide to older and newer hardware, that’s where HLG comes into play. But first, lets go deeper into “true” HDR options.

HDR10

HDR10 is built on top of HDR as a small set of information given to the video to tell the player the “Mastering Display Color Volume” for the entire video. The Mastering Display consists of static Maximum Frame Average Light Level (MaxFALL) and Maximum Content Light Level (MaxCLL). If you extract these metadata elements, stored in the video’s SEI messages, it will look something like this:

"side_data_list": [
                {
                    "side_data_type": "Mastering display metadata",
                    "red_x": "35400/50000",
                    "red_y": "14600/50000",
                    "green_x": "8500/50000",
                    "green_y": "39850/50000",
                    "blue_x": "6550/50000",
                    "blue_y": "2300/50000",
                    "white_point_x": "15635/50000",
                    "white_point_y": "16450/50000",
                    "min_luminance": "50/10000",
                    "max_luminance": "40000000/10000"
                },
                {
                    "side_data_type": "Content light level metadata",
                    "max_content": 0,
                    "max_average": 0
                } 
]

Encoders such as x265 (for HEVC / H.265) and rav1e (AV1) can support mastering display, and will take the arguments in a simplified format. For example they will take MaxFALL as “G(x,y)B(x,y)R(x,y)WP(x,y)L(max,min)”.

G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(40000000,50)

HDR10+

Expanding upon HDR10, HDR10+, also called HDR10 plus, supports that information for each frame, or scene by scene. Videos that have HDR10+ support will generally include HDR10 static metadata as well. HDR10+ is specifically for 10-bit video and up to 8K resolution.

The movie will then take a large array of information, for example here is a single frame’s extra metadata.

    {
      "BezierCurveData": {
        "Anchors": [124, 255, 380, 501, 617, 728, 845, 904, 935],
        "KneePointX": 194,
        "KneePointY": 242
      },
      "LuminanceParameters": {
        "AverageRGB": 354,
        "LuminanceDistributions": {
          "DistributionIndex": [1, 5, 10, 25, 50, 75, 90, 95, 99],
          "DistributionValues": [16, 5312, 98, 123, 315, 551, 707, 805, 4986]
        },
        "MaxScl": [7056, 5801, 4168]
      },
      "NumberOfWindows": 1,
      "TargetedSystemDisplayMaximumLuminance": 400
    }

The HEVC encoder x265 is capable of taking such a metadata JSON file and applying HDR10+ to a video. Right now I know of no AV1 or H.266/VVC encoders that can do so.

Dolby Vision

Dolby’s take on dynamic HDR is their proprietary “Dolby Vision” that unlike all other options, requires a royalty cost. However, it does support 12-bit video for a larger range of color and brightness than HDR10+ is capable of supporting currently.

Right now most 4K Blu-rays and TVs support either Dolby Vision or HDR10+, however it is possible to support both in the same video.

Also unlike all previous options, it is not possible to re-encode a video with Dolby Vision without the original metadata file, as there is no current way to extract the proprietary information. (The x265 encoder does support encoding with Dolby Vision for content creators.)

HLG

Over in the ever dying world of direct-to-house TV broadcasting, they need a way to support both 30 year old TVs and receivers, as well as the brand new hotness of HDR. So they applied some black magic (math) and figured out how to send signals using both gamma and logarithmic curves, hence the name “Hybrid Log-Gamma.” Older SDR TVs read the gamma curve and produce a proper picture while receivers and TVs supporting HLG will show the HDR video. That way they don’t run into the issue shown in the glass blowing images above.

Learn More

Obviously this was a “getting your feet wet” post, and I wanted to include links to a lot of great resources to learn more on the various aspects of HDR and video formats.

Tools

References

  • [1] I will be referring to these formats as they are used in consumer releases, such as 4K Blu-Rays and online streaming services. As such, they use a more compressed video format “yuv42010le” and are HEVC encoded using standard RECs. Just be aware that these ideas may exist and be represented differently in creator and prosumer product that have full color ranges.
  • [2] https://commons.wikimedia.org/wiki/File:CIExy1931_Rec_2020.svg
  • [3] https://commons.wikimedia.org/wiki/File:CIExy1931_Rec_709.svg
  • [4] https://developer.dolby.com/tools-media/sample-media/video-streams/dolby-vision-streams/

Encoding UHD 4K HDR10 and HDR10+ Videos

tl;dr: If you don’t want to do the work by hand on the command line, use FastFlix. If you have any issues, reach out on discord or open a github issue.

I talked about this before with my encoding setting for handbrake post, but there is was a fundamental flaw using Handbrake for HDR 10-bit video….it only has had a 8-bit internal pipeline! It and most other GUIs don’t yet support dynamic metadata, such as HDR10+ or Dolby Vision though.

2021-02 Update: Handbrake’s latest code has HDR10 static metadata support.

Thankfully, you can avoid some hassle and save HDR10 or HDR10+ by using FastFlix instead, or directly use FFmpeg. (To learn about extracting and converting with HDR10+ or saving Dolby Vision by remuxing, skip ahead.) If you want to do the work yourself, here are the two most basic commands you need to save your juicy HDR10 data. This will use the Dolby Vision (profile 8.1) Glass Blowing demo.

Extract the Mastering Display metadata

First, we need to use FFprobe to extract the Mastering Display and Content Light Level metadata. We are going to tell it to only read the first frame’s metadata -read_intervals "%+#1" for the file GlassBlowingUHD.mp4

ffprobe -hide_banner -loglevel warning -select_streams v -print_format json -show_frames -read_intervals "%+#1" -show_entries "frame=color_space,color_primaries,color_transfer,side_data_list,pix_fmt" -i GlassBlowingUHD.mp4

A quick breakdown of what we are sending ffprobe:

  • -hide_banner -loglevel warning Don’t display what we don’t need
  • -select_streams v We only want the details for the video (v) stream
  • -print_format json Make it easier to parse
  • -read_intervals "%+#1" Only grab data from the first frame
  • -show_entries ... Pick only the relevant data we want
  • -i GlassBlowingUHD.mp4 input (-i) is our Dobly Vision demo file

That will output something like this:

{ "frames": [
        {
            "pix_fmt": "yuv420p10le",
            "color_space": "bt2020nc",
            "color_primaries": "bt2020",
            "color_transfer": "smpte2084",
            "side_data_list": [
                {
                    "side_data_type": "Mastering display metadata",
                    "red_x": "35400/50000",
                    "red_y": "14600/50000",
                    "green_x": "8500/50000",
                    "green_y": "39850/50000",
                    "blue_x": "6550/50000",
                    "blue_y": "2300/50000",
                    "white_point_x": "15635/50000",
                    "white_point_y": "16450/50000",
                    "min_luminance": "50/10000",
                    "max_luminance": "40000000/10000"
                },
                {
                    "side_data_type": "Content light level metadata",
                    "max_content": 0,
                    "max_average": 0
} ] } ] }

I chose to output it with json via the -print_format json option to make it more machine parsible, but you can omit that if you just want the text.

We are now going to take all that data, and break it down into groups of <color abbreviation>(<x>, <y>) while leaving off the right side of the in most cases*, so for example we combine red_x "35400/50000"and red_y "14600/50000" into R(35400,14600).

G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(40000000, 50)

*If your data for colors is not divided by /50000 or luminescence not divided by 10000 and have been simplified, you will have to expand it back out to the full ratio. For example if yours lists 'red_x': '17/25', 'red_y': '8/25' you will have to divide 50000 by the current denominator (25) to get the ratio (2000) and multiply that by the numerator (17 and 8) to get the proper R(34000,16000).

This data, as well as the Content light level <max_content>,<max_average> of 0,0 will be fed into the encoder command options.

Convert the video

This command converts only the video, keeping the HDR10 intact. We will have to pass these arguments not to ffmpeg, but to the x265 encoder directly via the -x265-params option. (If you’re not familiar with FFmpeg, don’t fret. FastFlix, which I talk about later, will do the work for you!)

ffmpeg  -i GlassBlowingUHD.mp4 -map 0 -c:v libx265 -x265-params hdr-opt=1:repeat-headers=1:colorprim=bt2020:transfer=smpte2084:colormatrix=bt2020nc:master-display=G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(40000000,50):max-cll=0,0 -crf 20 -preset veryfast -pix_fmt yuv420p10le GlassBlowingConverted.mkv

Let’s break down what we are throwing into the x265-params:

  • hdr-opt=1 we are telling it yes, we will be using HDR
  • repeat-headers=1we want these headers on every frame as required
  • colorprim, transfer and colormatrix the same as ffprobe listed
  • master-display this is where we add our color string from above
  • max-cll Content light level data, in our case 0,0

During a conversion like this, when a Dolby Vision layer exists, you will see a lot of messages like [hevc @ 000001f93ece2e00] Skipping NAL unit 62 because there is an entire layer that ffmpeg does not yet know how to decode.

For the quality of the conversion, I was setting it to -crf 20 with a -preset veryfast to convert it quickly without a lot of quality loss. I dig deeper into how FFmpeg handles crf vs preset with regards to quality below.

All sound and data and everything will be copied over thanks to the -map 0 option, that is a blanket statement of “copy everything from the first (0 start index) input stream”.

That is really all you need to know for the basics of how to encode your video and save the HDR10 data!

FFmpeg conversion settings

I covered this a bit before in the other post, but I wanted to go through the full gauntlet of presets and crfs one might feel inclined to use. I compared each encoding with the original using VMAF and SSIM calculations over a 11 second clip. Then, I created over a 100 conversions for this single chart, so it is a little cramped:

First takeaways are that there is no real difference between veryslow and slower, nor between veryfast and faster, as their lines are drawn on top of each other. The same is true for both VMAF and SSIM scores.

Second, no one in their right mind would ever keep a recording stored by using ultrafast. That is purely for real time streaming use.

Now for VMAF scores, 5~6 points away from source is visually distinguishable when watching. In other words it will have very noticeable artifacts. Personally I can tell on my screen with just a single digit difference, and some people are even more sensitive, so this is by no means an exact tell all. At minimum, lets zoom in a bit and get rid of anything that will produce video with very noticeable artifacts.

From these chart, it seems clear that there is obviously no reason whatsoever to ever use anything other than slow. Which I personally do for anything I am encoding. However, slow lives up to its namesake.

Encoding Speed and Bitrate

I had to trim off veryslow and slower from the main chart to be able to even see the rest, and slow is still almost three times slower than medium. All the charts contain the same data, just with some of the longer running presets removed from each to better see details of the faster presets.

Please note, the first three crf datapoints are little dirty, as the system was in use for the first three tests. However, there is enough clean data to see how that compares down the line.

To see a clearer picture of how long it takes for each of the presets, I will exclude those first three times, and average the remaining data. The data is then compared against the medium (default) present and the original clip length of eleven seconds.

PresetTimevs “medium”vs clip length (11s)
ultrafast11.2043.370x0.982x
superfast12.1753.101x0.903x
veryfast19.1391.973x0.575x
faster19.1691.970x0.574x
fast22.7921.657x0.482x
medium37.7641.000x0.291x
slow97.7550.386x0.112x
slower315.9000.120x0.035x
veryslow574.5800.066x0.019x

What is a little scary here is that even with “ultrafast” preset we are not able to get realtime conversion, and these tests were run on a fairly high powered system wielding an i9-9900k! While it might be clear from the crf graph that slow is the clear winner, unless you have a beefy computer, it may be a non-option.

Use the slowest preset that you have patience for

FFmpeg encoding guide

Also unlike VBR encoding, the average bitrate and filesize using crf will wildly differ based upon different source material. This next chart is just showing off the basic curve effect you will see, however it cannot be compared to what you may expect to see with your file.

The two big jumps are between slow and medium as well as veryfast and superfast. That is interesting because while slow and medium are quite far apart on the VMAF comparison, veryfast and superfast are not. I expected a much larger dip from superfast to ultrafast but was wrong.

FastFlix, doing the heavy lifting for you!

I have written a GUI program, FastFlix, around FFmpeg and other tools to convert videos easily to HEVC, AV1 and other formats. While I won’t promise it will provide everything you are looking for, it will do the work for you of extracting the HDR10 details of a video and passing them into a FFmpeg command. FastFlix can even handle HDR10+ metadata! It also has a panel that shows you exactly the command(s) it is about to run, so you could copy it and modify it to your hearts content!

If you have any problems with it please help by raising an issue!

Extracting and encoding with HDR10+ metadata

First off, this is not for the faint of heart. Thankfully the newest FFmpeg builds for Windows now support HDR10+ metadata files by default, so this process has become a lot easier. Here is a quick overview how to do it, also a huge shutout to “Frank” in the comments below for this tool out to me!

You will have to download a copy of hdr10plus_parser from quietviod’s repo.

Check to make sure your video has HDR10+ information it can read.

ffmpeg -loglevel panic -i input.mkv -c:v copy -vbsf hevc_mp4toannexb -f hevc - | hdr10plus_parser --verify -

It should produce a nice message stating there is HDR10+ metadata.

Parsing HEVC file for dynamic metadata…
Dynamic HDR10+ metadata detected.

Once you confirmed it exists, extract it to a json file

ffmpeg -i input.mkv -c:v copy -vbsf hevc_mp4toannexb -f hevc - | hdr10plus_parser -o metadata.json -

Option 1: Using the newest FFmpeg

You just need to pass the metadata file via the dhdr10-info option in the x265-params. And don’t forget to add your audio and subtitle settings!

ffmpeg.exe -i input.mkv -c:v libx265 -pix_fmt yuv420p10le -x265-params "colorprim=bt2020:transfer=smpte2084:colormatrix=bt2020nc:master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1):max-cll=1016,115:hdr10=1:dhdr10-info=metadata.json" -crf 20 -preset medium "output.mkv"

Option 2: (original article) Using custom compiled x265

Now the painful part you’ll have to work on a bit yourself. To use the metadata file, you will need to custom compiled x265 with the cmake option “HDR10_PLUS” . Otherwise when you try to convert with it, you’ll see a message like “x265 [warning]: –dhdr10-info disabled. Enable HDR10_PLUS in cmake.” in the the output, but it will still encode, just without HDR10+ support.

Once that is compiled, you will have to use x265 as part of your conversion pipeline. Use x265 to convert your video with the HDR10+ metadata and other details you discovered earlier.

ffmpeg -loglevel panic -y -i input.mkv -to 30.0 -f yuv4mpegpipe -strict -1 - | x265 - --y4m --crf=20 --repeat-headers --hdr10 --colorprim=bt2020 --transfer=smpte2084 --colormatrix=bt2020nc --master-display="G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)" --max-cll="1016,115" -D 10 --dhdr10-info=metadata.json output.hevc

Then you will have to use that output.hevc file with ffmpeg to combine it with your audio conversions and subtitles and so on to repackage it.

Saving Dolby Vision

Thanks to Jesse Robinson in the comments, it seems it now may be possible to extract and convert a video with Dolby Vision without the original RPU file. Like the original HDR10+ section, you will need the x265 executable directly, as it does not support that option through FFmpeg at all.

Note I say “saving” and not “converting”, because unless you have the original RPU file for the DV to pass to x265, you’re out of luck as of now and cannot convert the video. The x265 encoder is able to take a RPU file and create a Dolby Vision ready movie.

Just like the manual HDR10+ process, first you have to use an extraction tool, in this case quietvoid’s dov_tool to extract the RPU.

ffmpeg -i GlassBlowing.mp4 -c:v copy -vbsf hevc_mp4toannexb -f hevc - | dovi_tool extract-rpu --rpu-out glass.rpu -

There are a few extra caveats when converting a Dolby Vision file with x265. First you need to use the appropriate 10 or 12 bit version. Second, change pix_fmt, input-depth and output-depth as required. Finally you MUST provide a dobly-vision-profile level for it to read the RPU as well as enable VBV by providing vbv-bufsize and vbv-maxrate (read the x265 command line options to see what settings are best for you.)

ffmpeg -i GlassBlowing.mp4 -f yuv4mpegpipe -strict -1 -pix_fmt yuv420p10le - | x265-10b - --input-depth 10 --output-depth 10 --y4m --preset veryfast --crf 22 --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(40000000,50)" --max-cll "0,0" --colormatrix bt2020nc --colorprim bt2020 --transfer smpte2084 --dolby-vision-rpu glass.rpu --dolby-vision-profile 8.1 --vbv-bufsize 20000 --vbv-maxrate 20000 glass_dv.hevc

Once the glass_dv.hevc file is created, you will need to use tsMuxerR ( this version is recommended by some, and the newest nightly is linked below) to put it back into a container as well as add the audio back.

Currently trying to add the glass_dv.hevc back into a container with FFmpeg or MP4Box will break the Dolby Vision. Hence the need for the special Muxer. FFmpeg 5.0+ supports remuxing with Dolby Vision. You can do with with the “Copy” function in FastFlix or by hand with ffmpeg on command line and -c:v copy for the video track. Make sure you’re using at minimum 5.0 or it won’t work! This will result in video that has BL+RPU Dolby Vision.

It is possible to also only convert the audio and change around the streams with remuxers. For example tsMuxeR (nightly build, not default download) is popular to be able to take mkv files that most TVs won’t recognize HDR in, and remux them into into ts files so they do. If you also have TrueHD sound tracks, you may need to use eac3to first to break it into the TrueHD and AC3 Core tracks before muxing.

Easily Viewing HDR / Video information

Another helpful program to quickly view what type of HDR a video has is MediaInfo. For example here is the original Dolby Vision Glass Blowing video info (some trimmed):

Video
ID                                       : 1
Format                                   : HEVC
Format/Info                              : High Efficiency Video Coding
Format profile                           : Main 10@L5.1@Main
HDR format                               : Dolby Vision, Version 1.0, dvhe.08.09, BL+RPU, HDR10 compatible / SMPTE ST 2086, HDR10 compatible
Codec ID                                 : hev1
Color space                              : YUV
Chroma subsampling                       : 4:2:0 (Type 2)
Bit depth                                : 10 bits
Color range                              : Limited
Color primaries                          : BT.2020
Transfer characteristics                 : PQ
Matrix coefficients                      : BT.2020 non-constant
Mastering display color primaries        : BT.2020
Mastering display luminance              : min: 0.0050 cd/m2, max: 4000 cd/m2
Codec configuration box                  : hvcC+dvvC

And here it is after conversion:

Video
ID                                       : 1
Format                                   : HEVC
Format/Info                              : High Efficiency Video Coding
Format profile                           : Main 10@L5.1@Main
HDR format                               : SMPTE ST 2086, HDR10 compatible
Codec ID                                 : V_MPEGH/ISO/HEVC
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 10 bits
Color range                              : Limited
Color primaries                          : BT.2020
Transfer characteristics                 : PQ
Matrix coefficients                      : BT.2020 non-constant
Mastering display color primaries        : BT.2020
Mastering display luminance              : min: 0.0050 cd/m2, max: 4000 cd/m2

Notice we have lost the Dolby Vision, BL+RPU information but at least we retained the HDR10 data, which Handbrake can’t do!

That’s a wrap!

Hope you found this information useful, and please feel free to leave a comment for feedback, suggestions or questions!

Until next time, stay safe and love each other!

Top 10ish Python standard library modules

When interviewing Python programming candidates, my wife always likes to ask the simple question, “can you name ten Python standard library modules?” This is harder than most think, as many people will completely blank out and others will be dead wrong. “Requests?” one poor soul answered. It’s a good interview question, as it gives insight onto what people are familiar with and may use regularly. So I sat down and though of of which ones I use and enjoy the most. So here are my top ten(ish) useful, favorite and unordered standard modules.

pathlib

Back in the dark days, you would have to store your path as a string, and call obscure functions under os.path to figure anything out about it. Pathlib removes the headache.

from pathlib import Path

my_path = Path('text_file.txt')
if not my_path.exists():
    my_path.write_text('File Content')
assert my_path.exists()
assert my_path.is_file()

Read more at the pathlib python docs.

tempfile

There are a boatload of uses for a temporary file or directory. Hence why it’s in the standard library. I find myself using them together, inside context managers more often than not.

from pathlib import Path
from tempfile import TemporaryDirectory, TemporaryFile


with TemporaryDirectory(prefix='Code_', suffix='_Calamity') as temp_dir:
    with TemporaryFile(dir=temp_dir) as temp_file:
        temp_file.write(b'Test')

        temp_file_path = Path(temp_file.name)
        assert temp_file_path.exists()

# Make sure file only exists within the context
assert not temp_file_path.exists()

I usually end up using this when a tool or library wants to work with a file rather than standard input, so short lived files in a context manager make life a lot easier. Tempfile python docs.

subprocess

Python is pretty amazing, but sometimes you do need to call other programs. Subprocess makes it easy to execute and interact with other executable across operating systems. Check out my other post on it!

from subprocess import run, PIPE
 
response = run("echo 'Join the Dark Side!'", shell=True, stdout=PIPE)

print(response.stdout.decode('utf-8'))
# 'Join the Dark Side!'

logging

This is probably the most useful built in library for debugging there is, and I see it either unused or misused more than anything else.

import logging
import sys
 
logger = logging.getLogger(__name__)
my_stream = logging.StreamHandler(stream=sys.stdout)
my_stream.setLevel(logging.DEBUG)
my_stream.setFormatter(
    logging.Formatter("%(asctime)s - %(name)-12s  "
                      "%(levelname)-8s %(message)s"))
logger.addHandler(my_stream)
logger.setLevel(logging.DEBUG)
 
logger.info("We the people")

If you haven’t already, go on your first date with python logging! It’s also possible to put all the configuration details into a separate ini or json file, learn more from the logging python docs.

threading and multithreading

Two very different things for widely different uses, but they have very similar interfaces and easy to talk about at the same time. Quick and dirty difference: Use threading for IO heavy tasks (writing to files, reading websites, etc) and multithreading for CPU heavy tasks.

from multiprocessing.pool import ThreadPool, Pool
 
def square_it(x):
    return x*x
 
# On Windows, make sure that multiprocessing doesn't start
# until after "if __name__ == '__main__'" 
 
# Pool and ThreadPool are interchangable in this example 
with Pool(processes=5) as pool:
   results = pool.map(square_it, [5, 4, 3, 2 ,1])
 
print(results) 
# [25, 16, 9, 4, 1]

I did a post on ThreadPools and Multithreading Pools, as I find them the easiest way to work with (multi)threading in Python.

os and sys

The Python world would not exist if we didn’t have all the power and functionality these built-ins bring. You really haven’t coded in Python if you haven’t used these yet, so I won’t even bother elaborating them here.

random and uuid

Maybe you’re making a game…

import random
random.choice(['Sneak Attack', 'High Kick', 'Low Kick'])

Or debugging a webserver…

from uuid import uuid4

# Bad example, but not writing out a whole webserver to prove a point
def get(*args):
request = uuid4()
logger.info(f'Request {request} called with args {args}')

Or turning your webserver into your own type of game…

if user_name == 'My Boss':
    time.sleep(random.randint(1, 5))

No matter which you are doing, it’s always handy to have randomly generated or unique numbers.

socket

The internet runs because of sockets. Modern technology exists because of sockets. Sockets are life, sockets are….annoying low level at times but its good to know the basics so you can appreciate everything written on top of them.

Thankfully the Python docs have good examples of them in use.

hashlib

Need to check a file’s integrity? Hashlib is there with md5 and sha hashes! I created a reusable function to easily reference when I need to do it.

Need to securly store people’s password hashes for a website? Hashlib now has scrypt support! Heck, here is my own function I always use to generate the scrypt hashes.

from collections import namedtuple
import hashlib
import os


Hashed = namedtuple('Hashed', ['hash', 'salt', 'n', 'r', 'p', 'key_length'])


def secure_hash(value: bytes, salt: bytes = None, key_length: int = 128, n: int = 2 ** 16, r: int = 8, p: int = 1):
    maxmem = n * r * 2 * key_length
    salt = salt or os.urandom(16)
    hashed = hashlib.scrypt(value, salt=salt, n=n, r=r, p=p, maxmem=maxmem, dklen=key_length)
    return Hashed(hash=hashed.hex(), salt=salt.hex(), n=n, r=r, p=p, key_length=key_length)

venv

You probably only think of it as a command when you run python -m venv python_virtual_env to create your environments, but it’s run that way because it’s a standard library. Every new project you start or Python program you install should be using this library, so it is used a lot!

Summary

There ya go, 10 or so can’t live without standard libraries! Isn’t it so nice that Python comes “batteries included”?