Make 2022 a more stable year, use ECC Memory

Are you annoyed at those random game crashes, Blue Screens and weird behavior from your applications? Do you create documents or media you don’t want to become corrupted? Then your next upgrade should include ECC Memory.

What is ECC memory?

Error correction code (ECC) memory detects and corrects bit flips (data corruption) in the data stored or transported as part of the computer’s memory. It stores a small amount of data about a larger section of data, and can recover the bad piece if only one is missing.

Best analogy I can think of is a simple Lego structure. If you have eight different sized Lego blocks on a set and know the weight of the entire set, if you weigh it again and the weight is less by a single block’s weight, you know what is missing. Whereas traditional memory wouldn’t ever weigh it and never know if something was wrong with said Lego structure.

There are actually several different types of ECC. DDR5 will come with “on-chip” ECC, but when talking causally or buying memory with ECC support, we mean “side-band” ECC. And both “side-band” and “on-chip” ECC can work together to make an even more robust system.

Does it really matter?

Yes. Your bits are being flipped and you don’t even realize it. If you are using 16GB of memory, you’ll be experiencing around three bit flips per hour. Keep in mind that means full on hammering that ram, not just it sitting idle. I did some testing myself on a system by running stress-ng on 160GB of ECC ram for an hour and the memory corrected 24 bit flips in that time. AKA 1 flip in every 6.7GB of ram per hour.

Now those numbers seem really scary, but let’s take it down to a more normal use level. Checking on my NAS server using 15GB of it’s ram overall, it had on average one reported ECC bit flip fix every 41.4 hours this past year. Though that may be highly underestimated because of background ECC correction (memory scrubbing) not being reported to the OS.

Obviously the bit flip doesn’t make a huge difference most of the time because everyone’s systems aren’t constantly crashing. However, I guarantee someone reading this has had a crash due to an unsuspected bit flip, and never truly knew the real culprit.

What systems support it?

Sadly, not all of them. Intel has gone the route of not including it in consumer chips at all recently, and AMD lets motherboard manufactures decide if they support it.

Chip SeriesAudiencePlatformSupport
AMD ZenConsumerAM4Varies by motherboard
(most AsRock support ECC)
AMD ThreadripperProsumerTR4/sTRX4Varies by motherboard
AMD EPYCServerSP3Yes
Intel CoreConsumerLGA 1700No
Intel XeonServerFCBGA1787Yes

If you’re building anything for a NAS or home server you need to be extra careful to select NAS boxes that support ECC ram or already come with it. As the TrueNAS community guide (PDF) states about ECC: “If you’re going to do it, do it right.”

Cartoon or Photo? – Image detection with Python

This all started with me just wanting a fast way to sort through image folders and remove cartoon images. That lead me down a spiraling rabbit hole of possibility. From using OpenCV to do different types of detection, or even training a Machine Learning model from scratch with Keras. This article will go through the multiple options available and learn how accurate they end up being.

What makes an image a cartoon?

Model is Jessica Marie Frye

If we are going to detect the differences between photos and cartoons, first we need to figure out how they are different. Importantly, how they are different in a quantifiable way. These are the measurable differences I could think up:

  • Cartoons have smoother gradients
  • Real images use a larger color palette
  • Cartoons usually have drawn edge outlines

Below we will try each of these options and see how well they fare.

Results First

You’re probably more interested in what will work the best for you and less about how I spent days toiling away at this, so to cut to the chase, here is how everything performed.

The best OpenCV contender turned out to be counting colors, with a combined 75% accuracy.

Overall, Machine Learning Image Classification using the Xception model wiped the floor with a combined 96% accuracy. This probably isn’t even as good as it could get with further fine tuning, but was more than good enough for my own needs.

These results were also with a hard threshold set to force it into either bucket of real or cartoon. I personally modify them for my own use to use a smaller range for absolutes, and then put the “unsure” ones in another folder. Further reducing any errant picks.

Machine Learning with Keras is the obvious pick if you have a good set of data to train with and a computer beefy enough to process it. However it’s not as portable and much longer than simply trying out one of the OpenCV methods.

OpenCV Gradient Differences

The first method we will use is pretty straight forward. We will blur the image a little, and compare it to it’s unaltered form and quantify the difference. First things first is that we will need opencv installed for python. Go into your venv for this and run:

pip install opencv-python

Now open up your IDE and create a new .py file to get started. First thing we are going to do is read the image into opencv which is the cv2 module. We will use a JPEG image as it will be the expected BRG color format.

import cv2

img = cv2.imread("/path/to/my/image.jpg")

Next we will blur the image a little using a bilateral filter, to even out the colors. We will also resize the image to a standard size so the blur across every image is the same.

img = cv2.resize(img, (1024, 1024))
color_blurred = cv2.bilateralFilter(img, 6, 250, 250)

You can check out the result to see how strong the effect is by previewing the image. Press any key to close the window.

# Optional Preview
cv2.imshow("blurred", color_blurred)
cv2.waitKey(0)
cv2.destroyAllWindows()

Then we need to compare this new color_blurred image to the original image. I accomplished this by comparing the histograms. We will have do that for each color individually.

diffs = []
for k, color in enumerate(('b', 'r', 'g')):
    print(f"Comparing histogram for color {color}")
    real_histogram = cv2.calcHist(img, [k], None, [256], [0, 256])
    color_histogram = cv2.calcHist(color_blurred, [k], None, [256], [0, 256])
    diffs.append(cv2.compareHist(real_histogram, color_histogram, cv2.HISTCMP_CORREL))

result = sum(diffs) / 3

compareHist will give us a result between 0 and 1 (one being the most similar.) We will need to set a threshold for how similar we cartoons will be. I have mine set at 0.98 (aka 98% similar.)

if result > 0.98:
    print("It's a cartoon!")
else: 
    print("It's a photo!")

And that’s it! Now you can test it out and see how it works for your images. I found this iteration to work very fast and have about a ~70% proper detection rate.

Let’s put it all together into a usable script!

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from pathlib import Path
from typing import Union
import argparse

import cv2


def is_cartoon(
    image: Union[str, Path],
    threshold: float = 0.98,
    preview: bool = False,
) -> bool:
    # read and resize image
    img = cv2.imread(str(image))
    img = cv2.resize(img, (1024, 1024))

    # blur the image to "even out" the colors
    color_blurred = cv2.bilateralFilter(img, 6, 250, 250)

    if preview:
        cv2.imshow("blurred", color_blurred)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    # compare the colors from the original image to blurred one.
    diffs = []
    for k, color in enumerate(("b", "r", "g")):
        # print(f"Comparing histogram for color {color}")
        real_histogram = cv2.calcHist(img, [k], None, [256], [0, 256])
        color_histogram = cv2.calcHist(color_blurred, [k], None, [256], [0, 256])
        diffs.append(
            cv2.compareHist(real_histogram, color_histogram, cv2.HISTCMP_CORREL)
        )

    return sum(diffs) / 3 > threshold


def command_line_options():
    args = argparse.ArgumentParser(
        "blur_compare",
        description="Determine if a image is likely a cartoon or photo.",
    )
    args.add_argument(
        "-p",
        "--preview",
        action="store_true",
        help="Show the blurred image",
    )
    args.add_argument(
        "-t",
        "--threshold",
        type=float,
        help="Cutoff threshold",
        default=0.98,
    )
    args.add_argument(
        "image",
        type=Path,
        help="Path to image file",
    )
    return vars(args.parse_args())


if __name__ == "__main__":
    options = command_line_options()
    if not options["image"].exists():
        raise FileNotFoundError(f"No image exists at {options['image'].absolute()}")
    if is_cartoon(**options):
        print(f"{options['image'].name} is a cartoon!")
    else:
        print(f"{options['image'].name} is a photo!")

OpenCV Color Counting

Using a subset of 512 colors in a 1024×1024 image, determine how much of the image can be reproduced

Next possible way to approach the problem was just figuring out how many colors were used for the majority of the image. We will use all the same code to load and resize the image from above, just this time we will add a loop to count all the colors (slow way, faster option below):

    # Find count of each color
    a = {}
    for item in img.flatten():
        value = tuple(item)
        if value not in a:
            a[value] = 1
        else:
            a[value] += 1

Next we will sort the dictionary by the most used color, and add the count of the top 512 images together.

The actual calculation is a lot of functionality in a small block of code. First let’s get a visual preview of what is happening by re-creating the image with only the selected colors with:

mask = numpy.zeros(img.shape[:2], dtype=bool)

for color, _ in sorted(a.items(), key=lambda pair: pair[1], reverse=True)[:512]:
    mask |= (img == color).all(-1)

img[~mask] = (255, 255, 255)

cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Here’s the code that basically calculates what you are seeing. We divide the sum of all the top colors by the size of the image to get what percent could be recreated with just those colors.

    # Identify the percent of the image that uses the top 512 colors
    most_common_colors = sum([x[1] for x in sorted(a.items(), key=lambda pair: pair[1], reverse=True)[:512]])
    return (most_common_colors / (1024 * 1024)) > 0.3 # new threshold

This script is pretty similar to the last one. And has a slightly higher success rate on my data set at 76%! It is 20% better at detecting what is a cartoon, but 10% worse at photos. It is also much much slower, some 20~50 times slower. (Will take a second per image instead of 0.05 of a second). However, we can speed that up by instead using some numpy trickery.

    # Replace everything after "Find count of each color" with this faster version, but it doesn't work with the preview.
    flattened = numpy.reshape(img, ((1024 * 1024), 3))
    multiplied = numpy.multiply(flattened, [100_000, 100, 1])
    sums = multiplied.sum(axis=1)
    unique, counts = numpy.unique(sums, return_counts=True)

    # Identify the percent of the image that uses the top 512 colors
    most_common_colors = sum(sorted(counts, reverse=True)[:512])
    return (most_common_colors / (1024 * 1024)) > threshold

Here is the entire script with the slower version that works with preview image. However after getting a good threshold I would suggest replacing the code with the faster version above.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from pathlib import Path
from typing import Union
import argparse

import cv2
import numpy


def is_cartoon(
    image: Union[str, Path],
    threshold: float = 0.3,
    preview: bool = False,
) -> bool:
    # read and resize image
    img = cv2.imread(str(image))
    img = cv2.resize(img, (1024, 1024))

    # Find count of each color
    a = {}
    for row in img:
        for item in row:
            value = tuple(item)
            if value not in a:
                a[value] = 1
            else:
                a[value] += 1

    if preview:
        mask = numpy.zeros(img.shape[:2], dtype=bool)

        for color, _ in sorted(a.items(), key=lambda pair: pair[1], reverse=True)[:512]:
            mask |= (img == color).all(-1)

        img[~mask] = (255, 255, 255)

        cv2.imshow("img", img)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    # Identify the percent of the image that uses the top 512 colors
    most_common_colors = sum(
        [x[1] for x in sorted(a.items(), key=lambda pair: pair[1], reverse=True)[:512]]
    )
    return (most_common_colors / (1024 * 1024)) > threshold


def command_line_options():
    args = argparse.ArgumentParser(
        "blur_compare",
        description="Determine if a image is likely a cartoon or photo.",
    )
    args.add_argument(
        "-p",
        "--preview",
        action="store_true",
        help="Show the blurred image",
    )
    args.add_argument(
        "-t",
        "--threshold",
        type=float,
        help="Cutoff threshold",
        default=0.3,
    )
    args.add_argument(
        "image",
        type=Path,
        help="Path to image file",
    )
    return vars(args.parse_args())


if __name__ == "__main__":
    options = command_line_options()
    if not options["image"].exists():
        raise FileNotFoundError(f"No image exists at {options['image'].absolute()}")
    if is_cartoon(**options):
        print(f"{options['image'].name} is a cartoon!")
    else:
        print(f"{options['image'].name} is a photo!")

OpenCV Edge Detection

This I haven’t had much luck with, only about 55% successful at determine what type of image it is. A coin toss is about as accurate. I don’t recommend using it as is, but if you have any ideas or improvements, let me hear them!

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from pathlib import Path
from typing import Union
import argparse

import cv2
import numpy


def is_cartoon(
    image: Union[str, Path],
    threshold: float = 4500,
    preview: bool = False,
) -> bool:
    # read and resize image
    img = cv2.imread(str(image))
    img = cv2.resize(img, (1024, 1024))

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred_gray = cv2.medianBlur(gray, 3)

    edges = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 5, 10
    )
    blurred_edges = cv2.adaptiveThreshold(
        blurred_gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 5, 10
    )

    if preview:
        cv2.imshow("edges", edges)
        cv2.imshow("blurred edges", blurred_edges)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    count_1 = numpy.count_nonzero(edges)
    count_2 = numpy.count_nonzero(blurred_edges)

    return abs(count_2 - count_1) < threshold


def command_line_options():
    args = argparse.ArgumentParser(
        "blur_compare",
        description="Determine if a image is likely a cartoon or photo.",
    )
    args.add_argument(
        "-p",
        "--preview",
        action="store_true",
        help="Show the blurred image",
    )
    args.add_argument(
        "-t",
        "--threshold",
        type=float,
        help="Cutoff threshold",
        default=4500,
    )
    args.add_argument(
        "image",
        type=Path,
        help="Path to image file",
    )
    return vars(args.parse_args())


if __name__ == "__main__":
    options = command_line_options()
    if not options["image"].exists():
        raise FileNotFoundError(f"No image exists at {options['image'].absolute()}")
    if is_cartoon(**options):
        print(f"{options['image'].name} is a cartoon!")
    else:
        print(f"{options['image'].name} is a photo!")

Keras Machine Learning Model

Time to take the kid gloves off, let’s use some ML image detection.

First we have to train a minified Xception model with a lot of hand picked good data. I used 556 cartoon images and 2295 real images in the training datasets. Then used that model for detection between another 2000+ unsorted images.

To get set by step details, please check out the same tutorial I used for this.

import shutil

import numpy as np
import os
import time
from pathlib import Path

import tensorflow as tf

root_dir = Path("/training/")
image_size = (180, 180)
batch_size = 32
epochs = 20
model_name = "my_model"


def make_model(input_shape, num_classes):
    inputs = tf.keras.Input(shape=input_shape)
    # Image augmentation block
    data_augmentation = tf.keras.Sequential(
        [
            tf.keras.layers.RandomFlip("horizontal"),
            tf.keras.layers.RandomRotation(0.1),
        ]
    )
    x = data_augmentation(inputs)

    # Entry block
    x = tf.keras.layers.Rescaling(1.0 / 255)(x)
    x = tf.keras.layers.Conv2D(32, 3, strides=2, padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)

    x = tf.keras.layers.Conv2D(64, 3, padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)

    previous_block_activation = x  # Set aside residual

    for size in [128, 256, 512, 728]:
        x = tf.keras.layers.Activation("relu")(x)
        x = tf.keras.layers.SeparableConv2D(size, 3, padding="same")(x)
        x = tf.keras.layers.BatchNormalization()(x)

        x = tf.keras.layers.Activation("relu")(x)
        x = tf.keras.layers.SeparableConv2D(size, 3, padding="same")(x)
        x = tf.keras.layers.BatchNormalization()(x)

        x = tf.keras.layers.MaxPooling2D(3, strides=2, padding="same")(x)

        # Project residual
        residual = tf.keras.layers.Conv2D(size, 1, strides=2, padding="same")(
            previous_block_activation
        )
        x = tf.keras.layers.add([x, residual])  # Add back residual
        previous_block_activation = x  # Set aside next residual

    x = tf.keras.layers.SeparableConv2D(1024, 3, padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)

    x = tf.keras.layers.GlobalAveragePooling2D()(x)
    if num_classes == 2:
        activation = "sigmoid"
        units = 1
    else:
        activation = "softmax"
        units = num_classes

    x = tf.keras.layers.Dropout(0.5)(x)
    outputs = tf.keras.layers.Dense(units, activation=activation)(x)
    return tf.keras.Model(inputs, outputs)


def train_data():
    train_ds = tf.keras.preprocessing.image_dataset_from_directory(
        str(root_dir),
        validation_split=0.2,
        subset="training",
        seed=1337,
        image_size=image_size,
        batch_size=batch_size,
    )
    val_ds = tf.keras.preprocessing.image_dataset_from_directory(
        str(root_dir),
        validation_split=0.2,
        subset="validation",
        seed=1337,
        image_size=image_size,
        batch_size=batch_size,
    )

    train_ds = train_ds.prefetch(buffer_size=32)
    val_ds = val_ds.prefetch(buffer_size=32)

    model = make_model(input_shape=image_size + (3,), num_classes=2)
    tf.keras.utils.plot_model(model, show_shapes=True)

    callbacks = [
        tf.keras.callbacks.ModelCheckpoint(f"{model_name}_{{epoch}}.h5"),
    ]
    model.compile(
        optimizer=tf.keras.optimizers.Adam(1e-3),
        loss="binary_crossentropy",
        metrics=["accuracy"],
    )
    model.fit(
        train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds,
    )


def clean_images():
    move_to = root_dir.parent / "bad_data"  # needs to be not in the training directory
    move_to.mkdir(exist_ok=True)
    moved = 0
    for directory in root_dir.glob("*"):
        if directory.name == move_to.name:
            continue
        if directory.is_dir():
            for i, file in enumerate(directory.glob("*")):
                if not file.name.lower().endswith(("jpg", "jpeg")) or not tf.compat.as_bytes("JFIF") in file.open("rb").read(10):
                    shutil.move(file, move_to / file.name)
                    moved += 1
        print("moved unclean data", moved, "from", directory)


def move_images():
    model = tf.keras.models.load_model(f"{model_name}_{epochs}.h5")
    cartoon_dir = Path("/cartoon/")
    real_dir = Path("/real/")

    real, cartoon, unknown = 0, 0, 0

    for file in Path("/unsorted/").glob("*.[jpg][jpeg][png]"):

        img = tf.keras.preprocessing.image.load_img(
            str(file), target_size=image_size
        )
        img_array = tf.keras.preprocessing.image.img_to_array(img)
        img_array = tf.expand_dims(img_array, 0)  # Create batch axis

        predictions = model.predict(img_array)
        score = predictions[0][0]
        if score > 0.98:
            real += 1
            shutil.move(file, real_dir / file.name)
        elif score < 0.02:
            cartoon += 1
            shutil.move(file, cartoon_dir / file.name)
        else:
            unknown += 1
            print(f"Could not figure out {file} as it was {score * 100}")
    print(f"Moved {real} to real and {cartoon} to cartoon, {unknown} were unmoved")


if __name__ == '__main__':
    clean_images()
    train_data()
    move_images()

I personally think 95% overall accuracy is amazing for no additional tuning. As well as not using this model for not exactly it’s intended use case. Generally we think about classifying items in the image (cat vs dog) not type of image (anime cat vs real cat).

Hardware Encoding 4K HDR10 Videos?

We are about to venture into a heated and abnormal world. Hardware encoders, designed for real-time encoding, may be reaching the point they can also be considered for video archival. The three common consumers options available that we will look at are:

  • AMD’s VCN encoder (using 6900XT / RDNA2 architecture)
  • Nvidia’s NVENC encoder (using 3060 RTX / 7th generation)
  • Intel QSV encoder (using i7-11800H / Version 8)

These tests will all use HEVC UHD HDR10 source material and have valid UHD HDR10 10-bit output as well. This is not a common use case. These tests have been done because these encoders are being added to FastFlix. FastFlix is a free and open source GUI for common video encoding software, specifically designed to help with HDR10 videos.

What’s so odd about using a hardware encoder for encoding videos?

When looking to compress an existing video file, one of the main purposes is to save disk space, i.e. lower the bitrate. This can help save on disk space or on bandwidth usage if the file will be transferred a lot. For example, a single megabyte difference for a popular file on a large site could start costing hundreds of dollars of bandwidth fees.

That means you want to get as much quality as you can, into the smallest package possible. Whereas, historically, hardware encoders were designed with the singular purpose to encode above real-time speeds. That way they could be used with video conferencing like Zoom or transcode videos as needed as you watch them. They always wanted good quality of course, but speed was always more important.

However, as their hardware and software matured it is now reaching a point where they can reasonably be considered instead of using crazy slow software encoders. I believe a large part of this is due to the introduction of the B-frame.

The almighty B-frame

In HEVC videos there are three types of frames, Index (I) frames, Predicted (P) frames, and Bidirectionally Predicted (B) frames. I frames are pretty easy to understand, an I frame is a full picture. It’s what everyone thinks about when they assume a video file is a bunch of pictures in a row, like it was back with real film.

However with modern video codecs, like HEVC, there are in-between frames that don’t have the full picture, instead are half filled with a bunch of math (motion vectors) that say “hey, move that area over this way for this frame.” P-frames do just that, they use the data in the frame before them and store the different as motion vectors.

P-frames

In idea scenarios, P-frames are about half the size of I-frames. That means if you have one I-frame and two P-frames after it for the entire movie, you’ve just cut off a third of the bitrate!

Linear-P-Frames Example

B-frames

B-frames are even more efficient, they can be half the size of a P-frame in an ideal world, aka a quarter of an I frame. Imagine having a single I frame, then two B-frames, a P-frame, then two more B-frames. You would have knocked off almost 60% of the bitrate! However, the problem there is that B-frames are crazy hard to calculate. B is for Bidirectional, which means they not only look at the frame that was encoded before them, but also the frame that will come after them. That’s right, you have to first encode the I frame, and the P-frame (or another I frame) that comes after it, before calculating the B-frames between them.

B-frame non-linear encoding workflow

Until recently, hardware encoders were only thinking of moving in a forward direction. They never thought to stop to wait to build frames that came behind the current one, who would want that? Well without the B-frame, you have to either compensate by having more index frames or having larger P-frames. Take the following scenario.

B-Frames example

If the B-Frame was replaced with a P-frame (so it was I P P I) the P-frame with the sun in it would require additional image data stored in that frame. Whereas by using a B-frame, it can be stored as a motion vector, thus saving large amounts of bitrate.

Thankfully Nvidia and Intel have both decided that it’s time to bring some quality to the hardware world, and do have B-frames in their latest hardware encoders. Sadly, AMD still doesn’t have any support for it with HEVC videos.

Hardware Encoding Head to Head

This is probably why you’re here, to see how they stack up to each other. We are going to compare two videos against four different encoders. All encodes produced valid HDR10 videos and used the same settings per encode (except for one hiccup with Intel QSV erroring using –extbrc on one video.)

Methodolgy

Tests like this are only as good as their documentation of how they were acquired. To that end, I wrote a script to run all these tests so that there were no quarrels about how they were tested (downloadable here). All tests were run on Windows 10 Version 10.0.19042 Build 19042 with the following settings.

The NVENC and QSV encodings were done on a laptop while plugged in on maximum power mode. The AMD VCN and x265 encodes were done on a desktop PC. Because of obvious hardware differences, encoding speed and power will not be considered as part of these tests.

AMD VCE / VCNNvidia NVENC Intel QSVx265 (Software)
Options

(bold are
non-default)
–ref 3
–preset slow
–pe
–tier High
–bframes 3
–ref 3
–preset quality
–tier high
–lookahead 16
–aq
–aq-strength 0
–multipass 2pass-full
–mv-precision Q-pel
–quality best
–la-depth 16
–la-quality slow
–extbrc

( –extbrc not set on
Glass Blowing)
-preset slow
aq-mode=2
strong-intra-smoothing=1
bframes=4
b-adapt=2

frame-threads=0
hdr10=1
hdr10_opt=1

chromaloc=2
Hardware6900 XT3060 RTXi7-11800Hi9-9900K
Driver21.7.2471.68 Game Ready27.20.100.974910.0.19041.546 (intelppl.sys)
SoftwareVCEEnc (x64) 6.13
AMF Version 1.4.21
NVEncC (x64) 5.37
NVENC API v11.1
CUDA 10.1
QSVEncC (x64) 5.06
Hardware API v1.35
FFmpeg ~4.4
x265 @ commit 82786fc

The x265 software was also given the benefit of running in dual pass mode. The slow preset was used as it was determined to be an ideal choice in previous tests.

The hardware encoders were set to use the “best” settings for measured quality, not perceived quality. I did not have as much time to test NVENC and QSV as I did VCN, so there may be more to eek out of those two encoders.

Downloadables

I wrote a python script to do all this testing, which can be downloaded: compare.py

If you want to check out the results of the VMAF / PSNR / SSIM yourself, here are the json files.

Wonderland Two – 4K HDR10 – 24fps – 51.4 Mb/s bitrate

First encoder comparison is with Samsung’s Wonderland Two 4K HDR10 video that is a time-lapse, where there are large chunks of the video that have small changes, while other parts have rapidly moving blurs. This one’s original bitrate is around 51,000k. We will be cutting it down to less than 1/20th it’s original size at 2500K, so there will be obvious quality lost. Let’s see how the encoders handle it!

VMAF scores - Wonderland Two

Both NVENC and QSV put up a great showing, riding the VMAF 93 mark at 7500K, a mere 15% of the original file size. VCN doesn’t even reach that point at 12500K, so would presumably need around twice as much bitrate to achieve the same quality!

Let’s take a deep dive into what these scores really translate too over the course of the movie. These charts are the VMAF scores at every 10 frames for each of the encoders with their 10000k bitrate video.

In this case the video is compressed to 5 times smaller than it’s original file size! It went on a diet from 821MB to 160MB, so expect to see some big encoding degradation.

VMAF breakdown chart for all encodings - wnderland two

It’s really obvious that AMD’s VCN encoder struggles with scene changes (the sudden sharp drops). I imagine this is due to lack of pre-analysis, as talked about in the last post. It also seems that QSV has some trouble with it, but at least NVENC has it’s head held high in that regard! I also suspect VCN is also trailing behind due to lack of B-frame support.

Dobly’s Glass Blowing Demo – 4K HDR10 – 60fps – 15.1 Mb/s bitrate

Second, Dobly’s Glass Blowing Demo is a three minute long 4K HDR10 video that is constant motion with lots of changing details, fire, smoke and rain. It is a high fps source video, but has much lower bitrate to start from. That means the scores should be higher, as it’s easier to reach the same quality.

Graph showing VMAF glass blowing demo comparison

AMD’s VCN and Nvidia’s NVENC trade blows the whole way with this video, with x265 taking a clear lead throughout the curve.

Intel QSV starts off rocking both NVENC and VCN then …something… happens. I honestly thought it was an error with my testing at first, possibly a misaligned video track while calculating VMAF. However after a quick look at the spread chat, we can see it’s more insidious than that.

VMAF breakdown chart for all encodings - glass blowing

There are two huge drops with QSV. I checked the video file, and found that there were two sections that became suddenly blocky and laggy, as if it was skipping or duplicating frames wrongly. I have no idea what caused it, and worse there was no indication of error! I can only speculate the encoder was designed to keep working even if there was some disruption in either compute capability or access to the video file. That is good for real-time encoding like streaming, but unacceptable for video archival.

This frankly leaves QSV out of the running for any consideration of use for backing up videos. If anyone knows of fixes or prevention of this, please leave a comment!

Samsung x RedBull: See the Unexpected – 4K HDR10 – 60fps – 51.8 Mb/s bitrate

Finally I tested with a high fps and high framerate video, Samsung x RedBull: See the Unexpected. I excluded testing QSV entirely with how bad it failed in the last run.

VAMF Chart - Samsung x RedBull

Nvidia’s NVENC is hot on x265’s tail, with VCN lagging behind.

VAMF breakdown for all encodings - Samsung x RedBull

AMD’s VCN still has clear drops. NVENC struggles more with this video than any of the others before, but still clearly pulls ahead of VCN. x265 is sitting tall, having a clean and very impressive line for a video that is a fifth of the original size with a minimum VMAF of 84.25 vs NVENC’s low of 56.13 then VCN taking rear guard at 48.16 (all had a max of 100).

Conclusions

Has HEVC hardware encoding caught up to the quality of software encoding?

No.

It seems that the three titans of the GPU industry still haven’t figured out how to build encoding hardware pipelines that are both fast and high quality.

Would I use hardware encoders for my own videos?

Yes.

Thee two Samsung tests were done on the extreme end of compressions, down to 1/20th of the original size. The Glass Blowing Demo VMAF deep dive gives a good idea of what would be more expected of a re-encode from going to 15Mb/s to 10Mb/s.

As we have said before, don’t needlessly re-encode videos. In my particular case I am happy using any of these hardware encoders for quick encodes rather that sitting around all day for a slightly, and probably unnoticeable, quality difference with x265.

Is there hope for a true consumer hardware encoding competitor to x265 quality?

No.

Why would there be? Everything available is “good enough” and there is no incentive for these companies to spend the phenomenal effort into this specific task.

I would absolutely love to be proven wrong on this, but I personally don’t see any improvements being invested on HEVC encoding when AV1 is right around the corner.

Disclaimer

These tests were done on my own hardware purchased myself. No company has not asked me to write this, modify or reword anything, nor omit anything. All conclusions are my own thoughts and opinions and in no way represent any company.


Raspberry Pi Hardware Encoding Speed Test

The GPU hardware encoder in the Raspberry Pi can greatly speed up encoding for H.264 videos. It is perfect to use for transcoding live streams as well. It can be accessed in FFmpeg with the h264_omx encoder. But is it fast enough for live stream a 1080p webcam?

You might have already seen a lot of people using the built-in raspberry pi cameras to stream crisp 1080p video, so why is this even a question? Well the catch there is the Pi Camera itself supports native H.264 encoding. Some webcams do as well, and they are honestly the best choice to use rather than constantly battering the GPU encoder if you don’t need to.

However, you may just happen to have an old cheap webcam that only does MJPEG streams. Those streams are generally too large to pump over the Raspberry Pi’s wifi at full fps. Would using the hardware encoder help you?

The Results First

This is why you’re here, let’s cut to the chase and do a comparison of the two latest Raspberry Pi’s available, the Pi 4 B, and Pi 3 B+ (we’ll throw in the little Pi Zero Wireless for fun too.) We’ll talk about the two videos used later, but suffice to say, Trackday is easier to encode and closer to what an average Webcam would produce. Artist is more of a torture test.

Boom! The Raspberry Pi 4 B is right in the butter zone. Most webcams that are 30fps would be handled just fine with the Pi 4 (depending on the quality of sensor and what you’re filming). The Pi 3 B+ isn’t terrible, but wouldn’t be able to encode a realtime stream smoothly.

The little Pi Zero? Well, it did its best and we’re proud of it!

Test Media

Trackday

The first video I used was a video captured from a car on a racetrack. It is 1920×1080 at 30fps captured from a dash cam.

10 second preview of 2 minute video – Jaguar F-Type R at Harris Hills Raceway

The original bitrate was a 10.5Mb/s and was cut down to 5Mb/s with all our encodes.

The command used is:

ffmpeg -i trackday.mp4 -c:v h264_omx -b:v 5M -an -sn -dn track_omx.mp4

Artist

The second file, artwork in progress by Clara Griffith, is also 1920×1080 at 30fps. However it is using BT.709 color space and started out at 35Mb/s!

Artwork of Clara Griffith – https://www.claragriffith.com/

If you see a webcam that advertises as “HDR” it is most likely using the BT.709 color space as well, and may give your Pi a headache.

This one was also compressed down to only 5Mb/s. Why 5Mb/s you ask? Well as it turns out, using the standard 2.4GHz wifi band, the Pi 3 and Pi 4 can each sustain about 6.5Mb/s download speed over my wireless. That means I know these videos could be played smoothly over wifi. The Pi Zero W on the other hand could only sustain around 3Mb/s wifi transfer speed.

All three systems were set up to use 256MB of GPU ram.

Video Quality

This actually took me by surprise to be honest. The quality of the encode is quite good when comparing to what a software encoder could do. I didn’t pull any punches either, the x264 encoder was set to dual pass and using veryslow preset with the film tune set. x264 commands:

ffmpeg -i "artist.mkv" -map 0:0 -c:v libx264 -pix_fmt yuv420p -tune:v film -color_primaries bt709 -color_trc bt709 -colorspace bt709  -pass 1 -passlogfile "pass_log_file_f9e11f23efaa23591fa8" -b:v 5000k -preset:v veryslow  -an -sn -dn -f mp4 /dev/null

ffmpeg -i "artist.mkv" -map 0:0 -c:v libx264 -pix_fmt yuv420p -tune:v film -color_primaries bt709 -color_trc bt709 -colorspace bt709  -pass 2 -passlogfile "pass_log_file_f9e11f23efaa23591fa8" -b:v 5000k -preset:v veryslow -map_metadata -1 -map_chapters 0  "artist-x264-5M-veryslow-film.mkv"

Of the two videos, Trackday is more realistic to what a webcam would experience and both encoders are near equal. So why was the Artist video so much better quality after encode, even though it started out with a lot higher bitrate? My informed guess on that is how crisp the original was, as well as the content is slow moving enough, the H.264 was able to reuse larger parts of the video for subsequent frames.

That means the software encoder x264 wins by virtue of being able to effectively use B-frames. Whereas the OMX hardware encoder doesn’t have support for B-frames. Therefor the Pi is on even ground when B-frames aren’t effective, but lags behind when they come into play.

A Note on Pi Camera Native H.264

I have found very little information about what Pi Cameras actually support H.264 natively. I only have “knock off” Raspberry Pi cameras that use the ribbon cable. They all support H.264 streams, which you can check with:

v4l2-ctl -d /dev/video0 --list-formats-ext

# ...
# [4]: 'H264' (H.264, compressed)
#                Size: Stepwise 32x32 - 2592x1944 with step 2/2
# [5]: 'MJPG' (Motion-JPEG, compressed)
#                Size: Stepwise 32x32 - 2592x1944 with step 2/2

or

ffmpeg -hide_banner -f video4linux2 -list_formats all -i /dev/video0

# [video4linux2,v4l2 @ 0x22c9d70] Raw       :     yuv420p :     Planar YUV 4:2:0 : {32-2592, 2}x{32-1944, 2}
# [video4linux2,v4l2 @ 0x22c9d70] Compressed:       mjpeg :            JFIF JPEG : {32-2592, 2}x{32-1944, 2}
# [video4linux2,v4l2 @ 0x22c9d70] Compressed:        h264 :                H.264 : {32-2592, 2}x{32-1944, 2}

I was kinda worried they were using some hackery to “pretend” to actually have native H.264 but instead using the GPU. However if the Pi Zero has anything to show, it has a really hard time encoding 1080p videos with the GPU encoder, so I do believe they have native support.

Wrap Up

If you already have:

A camera and a Raspberry Pi: you can get started streaming right away.

A 1080p webcam and want to stream from it: consider grabbing a Raspberry Pi 4.

The Raspberry Pi: first always try to grab a camera with built in H.264 support, otherwise, the Pi 4 should support most webcams using hardware accelerated encoding.

AMD Hardware Encoding in 2021 (VCE / VCN)

It’s 2021 and there still isn’t a lot of good info about AMD’s VCN hardware encoder for consumers. To that end, I will present my own take on the current “war” between software and hardware encoders, then go into quick details of how to best use AMD GPUs for encoding for video archival with FastFlix.

Note: I will only be comparing HEVC/H.265 10-bit HDR10 videos (both source and output). This use case is not usually covered in benchmarks and tests I have seen, and is more of the interest to those who have seen my previous posts on Encoding UHD HDR10 videos but may want to hardware accelerate it.

Terms:

  • VCE – Video Coding Engine – AMD’s early name for its built in encoding hardware
  • VCN – Video Core Next – AMD’s new name for GPU hardware encoders (VCE / VCN used interchangeably)
  • AMF – Advanced Media Framework – AMD’s code and tools for developers to work with VCE / VCN
  • HEVC / H.265 – High Efficiency Video Coding – The videos codec we will use that supports HDR10
  • HDR10 – A set of metadata presented alongside the video to give the display additional details
  • VMAF – Netflix’s video quality metric used to compare encoded video to source’s quality

Software vs Hardware Encoders

Software encoders are coded to run on any general purpose CPU. Doesn’t matter if it’s an Intel i7, an AMD 5900x or even on your phone’s ARM based CPU. It gives them great versatility. In the other corner, Hardware encoders rely on specific physical hardware components to exist in a system to use them to accelerate their transcoding. In today’s case, it takes a AMD GPU with VCN support to use the features we want to test.

Apples and oranges are both fruit, sports cars and pickup trucks are both vehicles, and software and hardware encoders both transcode videos. Just as it’s futile to compare the track capabilities of a supercar to the towing capacity of a pickup truck, we are about to venture into said territory with these encoders.

Please excuse the poor artwork, Clara wasn’t available this week so I had to do it myself!

Use case over metrics

The workhorse of the HEVC software encoding world is x265. There are plenty of other software encoders like the industry used ATEME TITAN File for UHD blu-rays or other open source encoders like Turing codec or kvazaar, but because of their lack of inclusion in standard tools like FFmpeg, they are overlooked.

So what is this workhorse good for? Flexibility and video archival. By being able to run on almost anything that can compile C code, x265 is a champion of cross platform operations. It is also the standard when looking for pure quality metrics in HEVC videos.

Comparatively, hardware encoding, in this case using AMD’s Video Coding Engine (VCE), is built to be power efficient and fast. Really, really fast. For example, on a 6900XT you can real-time encode a 60fps UHD stream on the slowest setting!

Let’s see what happens when they venture into each other’s bailiwicks.

Drag Race

Here’s what everybody loves: A good graph. We’re going to compare x265 using it’s fastest encoding speed vs the slowest setting AMD’s VCE currently has with a 60fps HDR10 4K source video.

Using a 60fps HDR10 UHD source, x265 was compared with it’s highest speed preset vs VCE’s slowest

As expected, it was a slaughter. Hardware encoding ran at 96 fps while x265 could only manage 14.5 fps. AMD’s hardware encoding clearly pummels the fastest setting x265 has to offer, even on an i9-9900k. Even if using an AMD 5950x which may be up to twice as fast, the hardware encoder would still dominate.

Where does this matter

Streaming and real-time transcoding. Hardware encoders were designed with the idea of “accelerated” encoding. Which makes them great for powering your Zoom calls or streaming to Twitch.

Encoding Quality Prowess

Now lets venture into x265’s house and compare computed quality with VMAF. We’ll be using the veryslow setting, darn the total time taken!

In this scenario we will compress a UHD video with a bitrate of 15,000k to four different rates. The goal for a decent encode is to reach at least VMAF 93, which is the bitrate range we will stay above. (VMAF 93+ doesn’t mean you won’t notice quality loss. It simply means that it probably WILL BE apparent if it is less than that.)

This was tested with a 30 second excerpt from Dolby’s Glass Blowing Demo (UHD profile 8.1)

Both encoders do great, keeping within a range that shouldn’t be to noticeable. However, x265 has a clear advantage at lower bitrates if all you care about is quality. It also maintains a steady edge throughout the test.

I have noticed while watching the AMD VCE encodes that it doesn’t do a great job with scene changes. I expect that is because VCE doesn’t support pre-analysis for HEVC, only for H.264. AMD VCE also suffers from lack of B-frame support, which I will talk about in the next blog post.

Where does this matter

Video archival. If you have a video that you are planning to discard for a high quality re-encode to save on file size, it’s better to stick with x265. Keep in mind, don’t just re-encode because you want to use a “better” codec, it’s always best to keep the original.

Gas Guzzling

This is a comparison I don’t see as often, and I think is overlooked. Encoding takes a lot of power, which means it costs money. I have been told by many FastFlix users that they let their x265 encodes run overnight, and some of their encodings take days!

This is also a harder to measure metric, as you need both encoders to produce the same quality output, as well as know their power usage. The entire thing also labors under the assumption that the only purpose of this machine is to encode the video while it is powered on, so please keep all that in mind as we dive into this.

To achieve the same quality of result file, it costs ten times as much in electricity to get the job done. This may not matter if you’re talking about a random encode here or there, but if you have a lot of videos to burn through, it could really start saving cash by switching to hardware encoders.

The Nitty Gritty about the power (Methodology)

Power usage will differ across hardware so this is for a very specific case that I can attest for (using both HWmonitor and a KillAWatt monitor). The 6900XT uses 63 watts over it’s baseline when encoding, for a total system draw of ~320w. The i9-9900k uses 111 watts over baseline for a total system draw of ~360w. (Keep in mind there is some extra CPU usage during hardware encode as well, so that is why total power is not a direct difference between the two.)

For the encoder speed, when using a UHD file I was able to get within 0.1% difference of VMAF when using VCE slow (same speed as above) and x265 veryfast (at 10.35fps).

Lets take a genericized use case of a two hour long video running at 24fps. 24fps * 60 seconds in a minutes * 60 minutes in an hour * 2 hours = 172,800 frames.

Estimated times and cost:

  • VCE – slow – 6900XT @ 96.47fps – 29.85 minutes
    • 0.16 kWh/day @ 320 watts
    • 0.019$ at @12 cents per kWh
  • x265 – i9-9900K@ 10.35fps – 287.3 minutes (four and a half hours)
    • 1.72 kWh/day @ 360 watts
    • 0.206$ at @12 cents per kWh

Where does this matter

The cost difference probably doesn’t sway many individuals But if you’re a prolific encoder, this could save you time and money.

Super Technical Head to Head Summary

Software (x265)Hardware (AMD VCE)
Quality⭐Best possibleLacks basic HEVC needs (B-frames / pre-analysis)
SpeedSlow to Super Slow⭐Crazy Fast
Requirements⭐Any old electrified rockNewer AMD GPU
Windows OS
Energy UsageAll the powah!sips daintily

So the winner is…. neither. If you’re encoding professionally you’ll be working with totally different software (like TITAN File). Then if you’re using it at home, it really just depends with what hardware you already have. If you’re wondering which GPU to get for the best encoding, wait for next month’s article 😉

Basically they both do what they were designed for. I would say Hardware encoders might have a slight overall edge, as they could be used for all cases. Whereas x265 currently can’t do UHD HDR10 real time encoding on consumer hardware.

Encoding HDR10 with AMD GPUs

Already got an AMD GPU and want to start encoding with it? Great, let’s get down to how to do it. First off make sure you are using Windows. If you’re using Linux for this, don’t.* If Linux is all you have, I would still recommend using a passthrough VM with Windows on it.

For Windows users, rigaya has made a beautiful tool called VCEEncC that has HDR10 support built in. It is a command line tool, but good news, FastFlix now supports it!

You will need to download VCEEncC manually as well, and make sure it is on the system path or link it up in File > Settings of FastFlix.

VCE doesn’t have a lot of options to worry about like other encoders, so can be on your way to re-encoding in no time!

* Possible on Linux to using VAAPI to encode HEVC. You would need to apply custom MESA patches to enable HDR10 support. AMF / VCEEncC only supports H.264 on Linix currently.

Best quality possible with VCE

Beauty is in the eye of the beholder, and so is video quality. Some features, like VBAQ (Variance Based Adaptive Quantization) will lower the measured metrics like VMAF and SSIM, but are designed look better to human eyes. Assuming you care about how the video looks, and aren’t just trying to impress your boss with numbers, we will stick with those.

Presetslow
Motion Vector Accuracyq-pel
VBAQenabled
Pre-Encodeenabled

Of course the largest determination of quality will be how much bitrate you will allow for (or which quantization rate you select). FastFlix has some loose recommendations, but what is truly needed will vary greatly dependent upon source. A GoPro bike ride video will require a lot more bitrate than a mounted security camera with very little movement overall.

Warnings and gotchas

Not all features are available for all cards. Also some features like b-frame support were promised for RDNA2 but still are not yet available.

Driver versions can make a difference. Always try using latest first, but if you experience issues using VCE it may not be using a new enough AMF version and need to downgrade to an older driver.

What do I use?

Personally I avoid re-encoding whenever possible. However, now that I do have an AMD GPU I do use it for any of my quick and dirty encoding needs. Though I would be saying the same about NVENC if I had a new Nvidia GPU (which does have B-frame support). In my opinion it’s simply not worth the time and energy investment for encoding with software. Either save the original or use a hardware encoder.

What about Nvidia (NVENC) or Intel (QSV)?

I am working to get access to latest generation hardware for both Nvidia’s NVENC and Intel’s QSV in the next month, so hopefully I will be able to create a follow up with some good head to head comparison. Historically NVENC has taken the crown, and by my research VCE hasn’t caught up yet, but who knows where QSV will end up!

Boring Details

  • x265 was used at commit 82786fccce10379be439243b6a776dc2f5918cb4 (2021-05-25) as part of FFmpeg
  • CPU is a i9-9900k
  • VCEEncC 6.13 on 6900xt with AMF Runtime 1.4.21 / SDK 1.4.21 using drivers 21.7.2

Disclaimer

These tests were done on my own hardware purchased myself. All conclusions are my own thoughts and opinions and in no way represent any company.