Tutorial

Posts designed as a walk through tutorial of how to achieve desired behavior.

pre-commit – Check yourself before you wreck yourself

pre-commit checks are like prenups. They run before you commit, get the dirty stuff out of the way, and may even save your relationship(s) in the long run. The difference is that pre-commit checks are written by programmers and not lawyers, so they are a lot easier to read and implement.

There are several types of pre-commit checks that can be run. Some common features you could chose are:

  • git sanity checks (no huge files, no bad merge lines, etc…)
  • code style enforcement
  • whitespace fixes
  • python specific checks

Why pre-commit?

Just as you should do a quick check in the mirror before heading out, pre-commits are a quick reflection of your code before making it public. They quickly check everything you’re about to add to git to make sure it’s copasetic. Many even automatically fix common issues for you.

Basically pre-commits make sure your code’s tee-shirt isn’t on inside out.

How to pre-commit

There are three basic steps to get pre-commit working with your code repo.

  1. Install the pre-commit tool onto your system
  2. Add a pre-commit configuration file to your repo
  3. Install the git hook for the repo

1. Install pre-commit

There are several different official ways to install pre-commit. I personally do not like packages polluting my global python site-packages. Instead, I install it with user level only privileges. Then I make sure its install path is added to the system path.

pip install pre-commit --user

You will receive a warning that the script installation location is not on the system path. (Unless you have done this before, and then you can skip the next part.)

If you really don’t care and are very good about using isolated virtual environment, you can just install it globally with pip install pre-commit without the --user flag and skip the next part.

Windows

Installing collected packages: pre-commit
  WARNING: The scripts pre-commit-validate-config.exe, pre-commit-validate-manifest.exe and pre-commit.exe are installed in 'C:\Users\Chris\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.

The above warning is what I received while installing this on windows. So I coped that path C:\Users\Chris\AppData\Roaming\Python\Python39\Scripts to my clipboard and then went through the following process to add it to my system path.

First, search for “edit system” on the window search bar and click the highlighted link.

Second, click on the “Environment Variables…” button at the bottom.

Third, in the bottom section under “System variables” click on “Path” and hit “Edit…”

Finally on the new window hit “New” in the top right and add the path previously copied to the clipboard. Then Hit Ok on all the windows to close them.

You will have to restart any cmd sessions you have opened. When you do, you should be able to run pre-commit just fine.

Linux

It’s a bit simpler to add the custom install location to the system path on linux.

INSTALL_PATH=/home/james/.local/bin  # Change to the path printed in your warning
echo "export PATH=\"${INSTALL_PATH}:\$PATH\"" >> ~/.bashrc 
source ~/.bashrc

2. The config file

In your repo, you will need to create a file named .pre-commit-config.yaml and chose which checks to add for your code. This is a config file that I use (with opinionated formatters and mypy removed). You can find the full list of built-in supported checks at the pre-commit github repo.

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.0.1
    hooks:

    # Identify invalid files
    - id: check-ast                        
    - id: check-yaml                       
    - id: check-json                       
    - id: check-toml                       

    # git checks
    - id: check-merge-conflict             
    - id: check-added-large-files          
      exclude: tests/media/.+
    - id: detect-private-key               
    - id: check-case-conflict              

    # Python checks
    - id: check-docstring-first            
    - id: debug-statements                 
    - id: requirements-txt-fixer           
    - id: fix-encoding-pragma              
    - id: fix-byte-order-marker            

    # General quality checks
    - id: mixed-line-ending                
    - id: trailing-whitespace              
      args: [--markdown-linebreak-ext=md]  
    - id: check-executables-have-shebangs  
    - id: end-of-file-fixer                

Personally I also always use black code formatter. It is not part of the standard checks, but is still easy to add.

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.0.1
    hooks:
       # ... 
  # Add at same level as the first pre-commit-hooks repo
  - repo: https://github.com/psf/black
    rev: 21.6b0 
    hooks:
      - id: black

3. Adding the pre-commit hook to git

Go into the repo with the config file at it’s root, and simply type:

pre-commit install

The first time you add it you will also want to run it on all the files in the repo to shore them all up (usually it only runs on the changed files).

pre-commit run --all-files

It will take a few minutes to install the virtual env for running these checks, but will be much faster after the first time. It should look something like:

pre-commit run --all-files
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Initializing environment for https://github.com/psf/black.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
Check python ast.........................................................Passed
Check Yaml...............................................................Passed
Check JSON...........................................(no files to check)Skipped
Check Toml...............................................................Passed
Check for merge conflicts................................................Passed
Check for added large files..............................................Passed
Detect Private Key.......................................................Passed
Check for case conflicts.................................................Passed
Check docstring is first.................................................Passed
Debug Statements (Python)................................................Passed
Fix requirements.txt.....................................................Passed
Fix python encoding pragma...............................................Passed
fix UTF-8 byte order marker..............................................Passed
Mixed line ending........................................................Passed
Trim Trailing Whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing README.md
Fixing docs/README.md
Fixing docs/build-licenses.txt

Check that executables have shebangs.....................................Passed
Fix End of Files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing docs/build-licenses.txt
Fixing .pre-commit-config.yaml

black....................................................................Passed

4. Sit back and relax

Congrats, you are now totally almost protected from the most common silly mistakes you used to make. You now have the opportunity to make completely new mistakes you wouldn’t have thought of before!

Upload large files fast with Dropzone.js

I have previously covered how to upload large files with dropzone.js, but it didn’t allow for parallel chunk uploads. In this article we will go over that new addition, as well as several other improvements.

Download the final executable or view github to see all the code now if you don’t want to read the article.

This is the final result for now, and can be customized if you so chose. Obviously I’m no graphic design, but I’d pick function over style anyway.

Design

Before we dive into code, lets think about the design. We will need to somehow handle multiple parts of a single file being uploaded at the same time in a random order. How do we keep track of that?

Thankfully, Dropzone will provide the server with a few different pieces of information which each chunk, they include:

  • dzuuid – unique ID per upload file
  • dzchunkindex – the chunk number of the current upload
  • dztotalfilesize – Total size of the upload
  • dzchunksize – Max size per chunk
  • dztotalchunkcount – The number of chunks in this file
  • dzchunkbyteoffset – The place in the file this chunk starts

In my mind there are two clear ways to approach the problem. First option is to create a sparse file of the full size to start with, using dztotalchunkcount and then with every incoming chunk, set the position of the file using dzchunkbyteoffset and write the data starting there.

The advantage of this method is that it only requires a single file on disk. The disadvantage is you have to worry about multiple threads accessing the same file at the same time.

The second choice is to write each chunk to a separate file, then when they are all uploaded concatenate them all to a single file and remove the individual chunks. The disadvantage are that that you require twice the space for a short time, and have to deal with cleanup of temporary files.

I personally preferred the second option, as it seemed a bit safer.

Upload Function

As a quick warning, I am now using Bottle instead of Flask for this upload, so a bit of the form syntax has changed since the last post.

from pathlib import Path
from threading import Lock
from collections import defaultdict
import shutil
import uuid

from bottle import route, run, request, error, response, HTTPError, static_file
from werkzeug.utils import secure_filename

lock = Lock()
chucks = defaultdict(list)

chunk_path = Path(__file__).parent / "chunks"
storage_path = Path(__file__).parent / "storage"
chunk_path.mkdir(exist_ok=True, parents=True)
storage_path.mkdir(exist_ok=True, parents=True)

@route("/upload", method="POST")
def upload():
    file = request.files.get("file")
    if not file:
        raise HTTPError(status=400, body="No file provided")

    dz_uuid = request.forms.get("dzuuid")
    if not dz_uuid:
        # Assume this file has not been chunked
        with open(storage_path / f"{uuid.uuid4()}_{secure_filename(file.filename)}", "wb") as f:
            file.save(f)
        return "File Saved"

    # Chunked download
    try:
        current_chunk = int(request.forms["dzchunkindex"])
        total_chunks = int(request.forms["dztotalchunkcount"])
    except KeyError as err:
        raise HTTPError(status=400, body=f"Not all required fields supplied, missing {err}")
    except ValueError:
        raise HTTPError(status=400, body=f"Values provided were not in expected format")
    
    # Create a new directory for this file in the chunks dir, using the UUID as the folder name
    save_dir = chunk_path / dz_uuid
    if not save_dir.exists():
        save_dir.mkdir(exist_ok=True, parents=True)

    # Save the individual chunk
    with open(save_dir / str(request.forms["dzchunkindex"]), "wb") as f:
        file.save(f)

    # See if we have all the chunks downloaded
    with lock:
        chucks[dz_uuid].append(current_chunk)
        completed = len(chucks[dz_uuid]) == total_chunks

    # Concat all the files into the final file when all are downloaded
    if completed:
        with open(storage_path / f"{dz_uuid}_{secure_filename(file.filename)}", "wb") as f:
            for file_number in range(total_chunks):
                f.write((save_dir / str(file_number)).read_bytes())
        print(f"{file.filename} has been uploaded")
        shutil.rmtree(save_dir)

    return "Chunk upload successful"

if __name__ == "__main__":
    run(server="paste")

Hopefully the code is decently self documented. We do a few checks at the start as we pull in the required parameters. Then we prepare the directory for where the temporary chunks will be stored, and write the incoming chunk there. We gather information on all the chunks and when then have been completed in a global dictionary, and when they are all uploaded they are assembled into the final file.

File Downloading

Now that we can put files on the server, what about getting them back? I personally don’t want people to host random files on my server, but others may. To accomplish that, we shouldn’t just list all the files to everyone that visits the site, but only to whoever uploaded it. Thankfully we can just store the uuid in a cookie on the frontend, and then have a very basic download function.

@route("/download/<dz_uuid>")
def download(dz_uuid):
    for file in storage_path.iterdir():
        if file.is_file() and file.name.startswith(dz_uuid):
            return static_file(file.name, root=file.parent.absolute(), download=True)
    return HTTPError(status=404)

This does complicate our frontend a bit, as we want to save both UUID and filename as text fields in a cookie. There are a lot of great libraries out there to make life easier with JavaScript and cookies, but I wanted to keep it simple and pure JS other than Dropzone, making the code a bit more complicated than last time.

Dropzone frontend

Instead of being a standalone file, I have also put this directly into the python file to make using it as a f-string a lot easier, but makes it a little harder to read.

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <link rel="stylesheet" href="{dropzone_cdn.rstrip('/')}/{dropzone_version}/min/dropzone.min.css"/>
    <link rel="stylesheet" href="{dropzone_cdn.rstrip('/')}/{dropzone_version}/min/basic.min.css"/>
    <script type="application/javascript"
        src="{dropzone_cdn.rstrip('/')}/{dropzone_version}/min/dropzone.min.js">
    </script>
    <title>pyfiledrop</title>
</head>
<body>

    <div id="content" style="width: 800px; margin: 0 auto;">
        <h2>Upload new files</h2>
        <form method="POST" action='/upload' class="dropzone dz-clickable" id="dropper" enctype="multipart/form-data">
        </form>

        <h2>
            Uploaded
            <input type="button" value="Clear" onclick="clearCookies()" />
        </h2>
        <div id="uploaded">

        </div>

        <script type="application/javascript">
            function clearCookies() {{
                document.cookie = "files=; Max-Age=0";
                document.getElementById("uploaded").innerHTML = "";
            }}

            function getFilesFromCookie() {{
                try {{ return document.cookie.split("=", 2)[1].split("||");}} catch (error) {{ return []; }}
            }}

            function saveCookie(new_file) {{
                    let all_files = getFilesFromCookie();
                    all_files.push(new_file);
                    document.cookie = `files=${{all_files.join("||")}}`;
            }}

            function generateLink(combo){{
                const uuid = combo.split('|^^|')[0];
                const name = combo.split('|^^|')[1];
                if ({'true' if allow_downloads else 'false'}) {{
                    return `<a href="/download/${{uuid}}" download="${{name}}">${{name}}</a>`;
                }}
                return name;
            }}


            function init() {{

                Dropzone.options.dropper = {{
                    paramName: 'file',
                    chunking: true,
                    forceChunking: {dropzone_force_chunking},
                    url: '/upload',
                    retryChunks: true,
                    parallelChunkUploads: {dropzone_parallel_chunks},
                    timeout: {dropzone_timeout}, // microseconds
                    maxFilesize: {dropzone_max_file_size}, // megabytes
                    chunkSize: {dropzone_chunk_size}, // bytes
                    init: function () {{
                        this.on("complete", function (file) {{
                            let combo = `${{file.upload.uuid}}|^^|${{file.upload.filename}}`;
                            saveCookie(combo);
                            document.getElementById("uploaded").innerHTML += generateLink(combo)  + "<br />";
                        }});
                    }}
                }}

                if (typeof document.cookie !== 'undefined' ) {{
                    let content = "";
                     getFilesFromCookie().forEach(function (combo) {{
                        content += generateLink(combo) + "<br />";
                    }});

                    document.getElementById("uploaded").innerHTML = content;
                }}
            }}

            init();

        </script>
    </div>
</body>
</html>

Notice we are using a slew of python variables that we are going to allow to be configurable upon launch.

Command line options

import argparse
...

allow_downloads = False
dropzone_cdn = "https://cdnjs.cloudflare.com/ajax/libs/dropzone"
dropzone_version = "5.7.6"
dropzone_timeout = "120000"
dropzone_max_file_size = "100000"
dropzone_chunk_size = "1000000"
dropzone_parallel_chunks = "true"
dropzone_force_chunking = "true"

...


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-p", "--port", type=int, default=16273, required=False)
    parser.add_argument("--host", type=str, default="0.0.0.0", required=False)
    parser.add_argument("-s", "--storage", type=str, default=str(storage_path), required=False)
    parser.add_argument("-c", "--chunks", type=str, default=str(chunk_path), required=False)
    parser.add_argument(
        "--max-size",
        type=str,
        default=dropzone_max_file_size,
        help="Max file size (Mb)",
    )
    parser.add_argument(
        "--timeout",
        type=str,
        default=dropzone_timeout,
        help="Timeout (ms) for each chuck upload",
    )
    parser.add_argument("--chunk-size", type=str, default=dropzone_chunk_size, help="Chunk size (bytes)")
    parser.add_argument("--disable-parallel-chunks", required=False, default=False, action="store_true")
    parser.add_argument("--disable-force-chunking", required=False, default=False, action="store_true")
    parser.add_argument("-a", "--allow-downloads", required=False, default=False, action="store_true")
    parser.add_argument("--dz-cdn", type=str, default=None, required=False)
    parser.add_argument("--dz-version", type=str, default=None, required=False)
    return parser.parse_args()


if __name__ == "__main__":

    args = parse_args()
    storage_path = Path(args.storage)
    chunk_path = Path(args.chunks)
    dropzone_chunk_size = args.chunk_size
    dropzone_timeout = args.timeout
    dropzone_max_file_size = args.max_size
    try:
        if int(dropzone_timeout) < 1 or int(dropzone_chunk_size) < 1 or int(dropzone_max_file_size) < 1:
            raise Exception("Invalid dropzone option, make sure max-size, timeout, and chunk-size are all positive")
    except ValueError:
        raise Exception("Invalid dropzone option, make sure max-size, timeout, and chunk-size are all integers")

    if args.dz_cdn:
        dropzone_cdn = args.dz_cdn
    if args.dz_version:
        dropzone_version = args.dz_version
    if args.disable_parallel_chunks:
        dropzone_parallel_chunks = "false"
    if args.disable_force_chunking:
        dropzone_force_chunking = "false"
    if args.allow_downloads:
        allow_downloads = True

    if not storage_path.exists():
        storage_path.mkdir(exist_ok=True)
    if not chunk_path.exists():
        chunk_path.mkdir(exist_ok=True)

    print(f"""Timeout: {int(dropzone_timeout) // 1000} seconds per chunk
Chunk Size: {int(dropzone_chunk_size) // 1024} Kb
Max File Size: {int(dropzone_max_file_size)} Mb
Force Chunking: {dropzone_force_chunking}
Parallel Chunks: {dropzone_parallel_chunks}
Storage Path: {storage_path.absolute()}
Chunk Path: {chunk_path.absolute()}
""")
    run(server="paste", port=args.port, host=args.host)

As this will become an executable, to be configurable we want to pass parameters upon launch.

Favicon

Now this is getting into the realm of silly. But to be an all in one script, we need to provide a binary file (the favicon) in the script itself. Thankfully ico files can be compressed rather easily, so we are going to compress it in the script itself, and decompress it when requested.

@route("/favicon.ico")
def favicon():
    return zlib.decompress(
        b"x\x9c\xedVYN\xc40\x0c5J%[\xe2\xa3|q\x06\x8e1G\xe1(=ZoV"
        b"\xb2\xa7\x89\x97R\x8d\x84\x04\xe4\xa5\xcb(\xc9\xb3\x1do"
        b"\x1d\x80\x17?\x1e\x0f\xf0O\x82\xcfw\x00\x7f\xc1\x87\xbf"
        b"\xfd\x14l\x90\xe6#\xde@\xc1\x966n[z\x85\x11\xa6\xfcc"
        b"\xdfw?s\xc4\x0b\x8e#\xbd\xc2\x08S\xe1111\xf1k\xb1NL"
        b"\xfcU<\x99\xe4T\xf8\xf43|\xaa\x18\xf8\xc3\xbaHFw\xaaj\x94"
        b"\xf4c[F\xc6\xee\xbb\xc2\xc0\x17\xf6\xf4\x12\x160\xf9"
        b"\xa3\xfeQB5\xab@\xf4\x1f\xa55r\xf9\xa4KGG\xee\x16\xdd\xff"
        b"\x8e\x9d\x8by\xc4\xe4\x17\tU\xbdDg\xf1\xeb\xf0Zh\x8e"
        b"\xd3s\x9c\xab\xc3P\n<e\xcb$\x05 b\xd8\x84Q1\x8a\xd6Kt\xe6"
        b"\x85(\x13\xe5\xf3]j\xcf\x06\x88\xe6K\x02\x84\x18\x90"
        b"\xc5\xa7Kz\xd4\x11\xeeEZK\x012\xe9\xab\xa5\xbf\xb3@i\x00"
        b"\xce\xe47\x0b\xb4\xfe\xb1d\xffk\xebh\xd3\xa3\xfd\xa4:`5J"
        b"\xa3\xf1\xf5\xf4\xcf\x02tz\x8c_\xd2\xa1\xee\xe1\xad"
        b"\xaa\xb7n-\xe5\xafoSQ\x14'\x01\xb7\x9b<\x15~\x0e\xf4b"
        b"\x8a\x90k\x8c\xdaO\xfb\x18<H\x9d\xdfj\xab\xd0\xb43\xe1"
        b'\xe3nt\x16\xdf\r\xe6\xa1d\xad\xd0\xc9z\x03"\xc7c\x94v'
        b"\xb6I\xe1\x8f\xf5,\xaa2\x93}\x90\xe0\x94\x1d\xd2\xfcY~f"
        b"\xab\r\xc1\xc8\xc4\xe4\x1f\xed\x03\x1e`\xd6\x02\xda\xc7k"
        b"\x16\x1a\xf4\xcb2Q\x05\xa0\xe6\xb4\x1e\xa4\x84\xc6"
        b"\xcc..`8'\x9a\xc9-\n\xa8\x05]?\xa3\xdfn\x11-\xcc\x0b"
        b"\xb4\x7f67:\x0c\xcf\xd5\xbb\xfd\x89\x9ebG\xf8:\x8bG"
        b"\xc0\xfb\x9dm\xe2\xdf\x80g\xea\xc4\xc45\xbe\x00\x03\xe9\xd6\xbb"
    )

Putting it all together

Here is the culmination of everything we talked about put into a script.

This may not always be the newest version, if you want to use it yourself please download the final executable or view github to see the latest code.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from pathlib import Path
from threading import Lock
from collections import defaultdict
import shutil
import argparse
import uuid
import zlib

from bottle import route, run, request, error, response, HTTPError, static_file
from werkzeug.utils import secure_filename

storage_path: Path = Path(__file__).parent / "storage"
chunk_path: Path = Path(__file__).parent / "chunk"

allow_downloads = False
dropzone_cdn = "https://cdnjs.cloudflare.com/ajax/libs/dropzone"
dropzone_version = "5.7.6"
dropzone_timeout = "120000"
dropzone_max_file_size = "100000"
dropzone_chunk_size = "1000000"
dropzone_parallel_chunks = "true"
dropzone_force_chunking = "true"

lock = Lock()
chucks = defaultdict(list)


@error(500)
def handle_500(error_message):
    response.status = 500
    response.body = f"Error: {error_message}"
    return response


@route("/")
def index():
    index_file = Path(__file__) / "index.html"
    if index_file.exists():
        return index_file.read_text()
    return f"""
<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <link rel="stylesheet" href="{dropzone_cdn.rstrip('/')}/{dropzone_version}/min/dropzone.min.css"/>
    <link rel="stylesheet" href="{dropzone_cdn.rstrip('/')}/{dropzone_version}/min/basic.min.css"/>
    <script type="application/javascript"
        src="{dropzone_cdn.rstrip('/')}/{dropzone_version}/min/dropzone.min.js">
    </script>
    <title>pyfiledrop</title>
</head>
<body>

    <div id="content" style="width: 800px; margin: 0 auto;">
        <h2>Upload new files</h2>
        <form method="POST" action='/upload' class="dropzone dz-clickable" id="dropper" enctype="multipart/form-data">
        </form>

        <h2>
            Uploaded
            <input type="button" value="Clear" onclick="clearCookies()" />
        </h2>
        <div id="uploaded">

        </div>

        <script type="application/javascript">
            function clearCookies() {{
                document.cookie = "files=; Max-Age=0";
                document.getElementById("uploaded").innerHTML = "";
            }}

            function getFilesFromCookie() {{
                try {{ return document.cookie.split("=", 2)[1].split("||");}} catch (error) {{ return []; }}
            }}

            function saveCookie(new_file) {{
                    let all_files = getFilesFromCookie();
                    all_files.push(new_file);
                    document.cookie = `files=${{all_files.join("||")}}`;
            }}

            function generateLink(combo){{
                const uuid = combo.split('|^^|')[0];
                const name = combo.split('|^^|')[1];
                if ({'true' if allow_downloads else 'false'}) {{
                    return `<a href="/download/${{uuid}}" download="${{name}}">${{name}}</a>`;
                }}
                return name;
            }}


            function init() {{

                Dropzone.options.dropper = {{
                    paramName: 'file',
                    chunking: true,
                    forceChunking: {dropzone_force_chunking},
                    url: '/upload',
                    retryChunks: true,
                    parallelChunkUploads: {dropzone_parallel_chunks},
                    timeout: {dropzone_timeout}, // microseconds
                    maxFilesize: {dropzone_max_file_size}, // megabytes
                    chunkSize: {dropzone_chunk_size}, // bytes
                    init: function () {{
                        this.on("complete", function (file) {{
                            let combo = `${{file.upload.uuid}}|^^|${{file.upload.filename}}`;
                            saveCookie(combo);
                            document.getElementById("uploaded").innerHTML += generateLink(combo)  + "<br />";
                        }});
                    }}
                }}

                if (typeof document.cookie !== 'undefined' ) {{
                    let content = "";
                     getFilesFromCookie().forEach(function (combo) {{
                        content += generateLink(combo) + "<br />";
                    }});

                    document.getElementById("uploaded").innerHTML = content;
                }}
            }}

            init();

        </script>
    </div>
</body>
</html>
    """


@route("/favicon.ico")
def favicon():
    return zlib.decompress(
        b"x\x9c\xedVYN\xc40\x0c5J%[\xe2\xa3|q\x06\x8e1G\xe1(=ZoV"
        b"\xb2\xa7\x89\x97R\x8d\x84\x04\xe4\xa5\xcb(\xc9\xb3\x1do"
        b"\x1d\x80\x17?\x1e\x0f\xf0O\x82\xcfw\x00\x7f\xc1\x87\xbf"
        b"\xfd\x14l\x90\xe6#\xde@\xc1\x966n[z\x85\x11\xa6\xfcc"
        b"\xdfw?s\xc4\x0b\x8e#\xbd\xc2\x08S\xe1111\xf1k\xb1NL"
        b"\xfcU<\x99\xe4T\xf8\xf43|\xaa\x18\xf8\xc3\xbaHFw\xaaj\x94"
        b"\xf4c[F\xc6\xee\xbb\xc2\xc0\x17\xf6\xf4\x12\x160\xf9"
        b"\xa3\xfeQB5\xab@\xf4\x1f\xa55r\xf9\xa4KGG\xee\x16\xdd\xff"
        b"\x8e\x9d\x8by\xc4\xe4\x17\tU\xbdDg\xf1\xeb\xf0Zh\x8e"
        b"\xd3s\x9c\xab\xc3P\n<e\xcb$\x05 b\xd8\x84Q1\x8a\xd6Kt\xe6"
        b"\x85(\x13\xe5\xf3]j\xcf\x06\x88\xe6K\x02\x84\x18\x90"
        b"\xc5\xa7Kz\xd4\x11\xeeEZK\x012\xe9\xab\xa5\xbf\xb3@i\x00"
        b"\xce\xe47\x0b\xb4\xfe\xb1d\xffk\xebh\xd3\xa3\xfd\xa4:`5J"
        b"\xa3\xf1\xf5\xf4\xcf\x02tz\x8c_\xd2\xa1\xee\xe1\xad"
        b"\xaa\xb7n-\xe5\xafoSQ\x14'\x01\xb7\x9b<\x15~\x0e\xf4b"
        b"\x8a\x90k\x8c\xdaO\xfb\x18<H\x9d\xdfj\xab\xd0\xb43\xe1"
        b'\xe3nt\x16\xdf\r\xe6\xa1d\xad\xd0\xc9z\x03"\xc7c\x94v'
        b"\xb6I\xe1\x8f\xf5,\xaa2\x93}\x90\xe0\x94\x1d\xd2\xfcY~f"
        b"\xab\r\xc1\xc8\xc4\xe4\x1f\xed\x03\x1e`\xd6\x02\xda\xc7k"
        b"\x16\x1a\xf4\xcb2Q\x05\xa0\xe6\xb4\x1e\xa4\x84\xc6"
        b"\xcc..`8'\x9a\xc9-\n\xa8\x05]?\xa3\xdfn\x11-\xcc\x0b"
        b"\xb4\x7f67:\x0c\xcf\xd5\xbb\xfd\x89\x9ebG\xf8:\x8bG"
        b"\xc0\xfb\x9dm\xe2\xdf\x80g\xea\xc4\xc45\xbe\x00\x03\xe9\xd6\xbb"
    )


@route("/upload", method="POST")
def upload():
    file = request.files.get("file")
    if not file:
        raise HTTPError(status=400, body="No file provided")

    dz_uuid = request.forms.get("dzuuid")
    if not dz_uuid:
        # Assume this file has not been chunked
        with open(storage_path / f"{uuid.uuid4()}_{secure_filename(file.filename)}", "wb") as f:
            file.save(f)
        return "File Saved"

    # Chunked download
    try:
        current_chunk = int(request.forms["dzchunkindex"])
        total_chunks = int(request.forms["dztotalchunkcount"])
    except KeyError as err:
        raise HTTPError(status=400, body=f"Not all required fields supplied, missing {err}")
    except ValueError:
        raise HTTPError(status=400, body=f"Values provided were not in expected format")

    save_dir = chunk_path / dz_uuid

    if not save_dir.exists():
        save_dir.mkdir(exist_ok=True, parents=True)

    # Save the individual chunk
    with open(save_dir / str(request.forms["dzchunkindex"]), "wb") as f:
        file.save(f)

    # See if we have all the chunks downloaded
    with lock:
        chucks[dz_uuid].append(current_chunk)
        completed = len(chucks[dz_uuid]) == total_chunks

    # Concat all the files into the final file when all are downloaded
    if completed:
        with open(storage_path / f"{dz_uuid}_{secure_filename(file.filename)}", "wb") as f:
            for file_number in range(total_chunks):
                f.write((save_dir / str(file_number)).read_bytes())
        print(f"{file.filename} has been uploaded")
        shutil.rmtree(save_dir)

    return "Chunk upload successful"


@route("/download/<dz_uuid>")
def download(dz_uuid):
    if not allow_downloads:
        raise HTTPError(status=403)
    for file in storage_path.iterdir():
        if file.is_file() and file.name.startswith(dz_uuid):
            return static_file(file.name, root=file.parent.absolute(), download=True)
    return HTTPError(status=404)


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-p", "--port", type=int, default=16273, required=False)
    parser.add_argument("--host", type=str, default="0.0.0.0", required=False)
    parser.add_argument("-s", "--storage", type=str, default=str(storage_path), required=False)
    parser.add_argument("-c", "--chunks", type=str, default=str(chunk_path), required=False)
    parser.add_argument(
        "--max-size",
        type=str,
        default=dropzone_max_file_size,
        help="Max file size (Mb)",
    )
    parser.add_argument(
        "--timeout",
        type=str,
        default=dropzone_timeout,
        help="Timeout (ms) for each chuck upload",
    )
    parser.add_argument("--chunk-size", type=str, default=dropzone_chunk_size, help="Chunk size (bytes)")
    parser.add_argument("--disable-parallel-chunks", required=False, default=False, action="store_true")
    parser.add_argument("--disable-force-chunking", required=False, default=False, action="store_true")
    parser.add_argument("-a", "--allow-downloads", required=False, default=False, action="store_true")
    parser.add_argument("--dz-cdn", type=str, default=None, required=False)
    parser.add_argument("--dz-version", type=str, default=None, required=False)
    return parser.parse_args()


if __name__ == "__main__":

    args = parse_args()
    storage_path = Path(args.storage)
    chunk_path = Path(args.chunks)
    dropzone_chunk_size = args.chunk_size
    dropzone_timeout = args.timeout
    dropzone_max_file_size = args.max_size
    try:
        if int(dropzone_timeout) < 1 or int(dropzone_chunk_size) < 1 or int(dropzone_max_file_size) < 1:
            raise Exception("Invalid dropzone option, make sure max-size, timeout, and chunk-size are all positive")
    except ValueError:
        raise Exception("Invalid dropzone option, make sure max-size, timeout, and chunk-size are all integers")

    if args.dz_cdn:
        dropzone_cdn = args.dz_cdn
    if args.dz_version:
        dropzone_version = args.dz_version
    if args.disable_parallel_chunks:
        dropzone_parallel_chunks = "false"
    if args.disable_force_chunking:
        dropzone_force_chunking = "false"
    if args.allow_downloads:
        allow_downloads = True

    if not storage_path.exists():
        storage_path.mkdir(exist_ok=True)
    if not chunk_path.exists():
        chunk_path.mkdir(exist_ok=True)

    print(
        f"""Timeout: {int(dropzone_timeout) // 1000} seconds per chunk
Chunk Size: {int(dropzone_chunk_size) // 1024} Kb
Max File Size: {int(dropzone_max_file_size)} Mb
Force Chunking: {dropzone_force_chunking}
Parallel Chunks: {dropzone_parallel_chunks}
Storage Path: {storage_path.absolute()}
Chunk Path: {chunk_path.absolute()}
"""
    )
    run(server="paste", port=args.port, host=args.host)

Make it yours, and give back if you can!

What will you add to this script? Set a max time for how long you can see the uploaded files? A way to ensure the file exists on the server before trying to download it? Checksum comparison to avoid using space for duplicate files?

However you make it better, please consider to add a pull request for your features so anyone can benefit from it!

Keep Windows from going to sleep, no power settings needed!

Say you have a long running process that you let go overnight, only to come back the next morning and realize “Oh no, only an hour into it, my computer went to sleep!”. Thanks to some Windows internals, it’s possible to make it realize there is a background task running and should not go to sleep.

Artwork by Clara Griffith

I personally use this exact method in my FastFlix program to make sure the computer doesn’t interrupt the coding process. It should work with any version of Python3, and it boils down too:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import ctypes
import time

CONTINUOUS = 0x80000000
SYSTEM_REQUIRED = 0x00000001
DISPLAY_REQUIRED = 0x00000002 

ctypes.windll.kernel32.SetThreadExecutionState(CONTINUOUS | SYSTEM_REQUIRED)
try:
    # This is where you would do stuff
    while True:
        time.sleep(600) 
finally:
    ctypes.windll.kernel32.SetThreadExecutionState(CONTINUOUS)

As this example shows, you let this run in the background to always keep your computer from sleeping. Just be careful you don’t abuse your IT policies by allowing your computer to stay on for a lot longer than they want.

It’s pretty straightforward code, the main function being SetThreadExecutionState.

ES_CONTINUOUS 0x80000000Informs the system that the state being set should remain in effect until the next call that uses ES_CONTINUOUS and one of the other state flags is cleared.
ES_DISPLAY_REQUIRED 0x00000002Forces the display to be on by resetting the display idle timer.
ES_SYSTEM_REQUIRED 0x00000001Forces the system to be in the working state by resetting the system idle timer.
Table from https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-setthreadexecutionstate

Notice that it doesn’t take multiple arguments, like what you expect in Python. Instead it accepts a single hex value that represents the culmination of all the options you want to set. Hence why you see us passing them in via a “bitwise or” expression. For example, if you also wanted to force the screen to stay on, just need to add:

ctypes.windll.kernel32.SetThreadExecutionState(CONTINUOUS | SYSTEM_REQUIRED | DISPLAY_REQUIRED)

After this is set, it is extremely important to remember to always reset it back to ES_CONTINUOUS after your work is done.

Use atexit for safety

In our example above, we simply add this in a finally clause. For a more robust program you may want to ensure it’s always called via atexit.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import ctypes
import atexit

CONTINUOUS = 0x80000000
SYSTEM_REQUIRED = 0x00000001
DISPLAY_REQUIRED = 0x00000002 

ctypes.windll.kernel32.SetThreadExecutionState(CONTINUOUS | SYSTEM_REQUIRED)

@atexit.register
def cleanup():
    ctypes.windll.kernel32.SetThreadExecutionState(CONTINUOUS)

# Do Stuff    

Create an image in Task Manager from CPU Usage

A lot of people have pass around a internet video of someone using a crazy high core count computer to display a video via Task Manager’s CPU usage. The one problem with it: it’s probably fake. Task Manager displays the last 60 seconds of activity, not a instant view like shown in the video. But what if we kept the usage the same for sixty seconds, could we at least make an image?

So the first problem, is how do we generate enough load for a CPU usage to noticeably increase? Luckily there is an old goto into the benchmark world that is really simple to implement. Square Roots. Throw a few million square roots at a CPU and watch it light up.

from itertools import count
import math

def run_benchmark():
    # Warning, no way to stop other than Ctrl+C or killing the process
    for num in count():
        math.sqrt(num)

run_benchmark()

The real problem is making sure we have a way to stop it. The easiest way is with a timeout, but just the slight work of gathering system time may throw off how much work we want to do in the future. So lets only do that, say, every 100,000 operations.

from itertools import count
import math
import time 

def run_benchmark(timeout=60):
    start_time = time.perf_counter()
    for num in count():
        if num % 100_000 == 0.0:
            if time.perf_counter() > start_time + timeout:
                return
        math.sqrt(num)

run_benchmark()

Excellent, now we can pump up a core to max workload. However, we currently have no control over which CPU it will run on! We have to set the “affinity” of the process. There is no easy way to do that with Python directly, and in this case we have to go straight to the Win32 API. In this case we will use the pywin32 library to access the win32process.

pip install pywin32

We will then use the SetProcessAffinityMask function to lock our program to a specific CPU, which will also require grabbing some details from GetCurrentProcess Win32 API function. One thing not really documented anywhere I could find, is the fact you set the CPU core affinity not by it’s actual number, but by the mask itself. Which we can create by taking 2 ** cpu_core.

from itertools import count
import math
import time 

import win32process  # pywin32

def run_benchmark(cpu_core=0, timeout=60):
    start_time = time.perf_counter()
    process_id = win32process.GetCurrentProcess()
    win32process.SetProcessAffinityMask(process_id, 2 ** cpu_core)

    for num in count():
        if num % 100_000 == 0.0:
            if time.perf_counter() > start_time + timeout:
                return
        math.sqrt(num)

run_benchmark()

So now every time we run the program, it should only max out the first core. If we want to do this across multiple cores, we will have to create multiple processes at the same time. My favorite way to do that is with multiprocessing maps. So instead of lighting up a single processor, let’s pump them all to the roof.

from itertools import count
import math
import time
from multiprocessing.pool import Pool
from multiprocessing import cpu_count

import win32process  # pywin32


def run_benchmark(cpu_core=0, timeout=60):
    start_time = time.perf_counter()
    process_id = win32process.GetCurrentProcess()
    win32process.SetProcessAffinityMask(process_id, 2 ** cpu_core)

    for num in count():
        if num % 100_000 == 0.0:
            if time.perf_counter() > start_time + timeout:
                return
        math.sqrt(num)


if __name__ == '__main__':
    arguments = [(core, 60) for core in range(cpu_count())]
    # [(0, 60), (1, 60), (2, 60), (3, 60)] 
    # each tuple will be as arguments to run_benchmark as its arguments

    with Pool(processes=cpu_count()) as p:
        p.starmap(run_benchmark, arguments)
    run_benchmark()

We have to throw the multiprocessing after the `if __name__ == ‘__main__’: block due to how CPython starts up on Windows. (It’s also just a good idea for any scripts.)

At this point you could also change up which cores it is running on to see how they correspond on the Task Manager. For example, you could change the range increments to only launch on each other core. range(0, cpu_count(), 2)

On my 8 core machine (so 16 logical cores) I can make a quick X shape by selecting certain cores

arguments = [(core, 100) for core in [0, 3, 5, 6, 9, 10, 12, 15]]

Now remember there are two types of CPU cores according to the operating system, physical and logical. Physical is the number of actual cores on the CPU, but if the CPU has SMT (Simultaneous multi-threading) it will double that. So that means every odd number core is actually a fake one (remember cores start at 0). Which means it has to use some of previous core’s resources. Hence why cores 2, 4, 8 and 14 are showing higher usage.

But what if we want even cooler graphics and don’t want to be limited to just 100% cpu usage? Well then we need to tell the computer to not work for very small amounts of time. Aka sleep. So lets try adding a sleep every 100K ops for oh say, 60 milliseconds.

def run_benchmark(cpu_core=0, timeout=60):
    start_time = time.perf_counter()
    process_id = win32process.GetCurrentProcess()
    win32process.SetProcessAffinityMask(process_id, 2 ** cpu_core)

    for num in count():
        if num % 100_000 == 0.0:
            if time.perf_counter() > start_time + timeout:
                return
            time.sleep(0.06)
        math.sqrt(num)

This time I am also going to run it just on physical cores.

arguments = [(core, 100, 80) for core in range(0, cpu_count(), 2)]

How about that, now it’s using just about 50% usage on each core on my computer. If you’re trying this on your own, this is where the fun begins. I suggest trying out different time offsets to see if you can get a list of times for 10~90% usage. For example, mine is close to:

usage_to_sleep_time = {
    100: 0,
    90: 0.014,
    80: 0.02,
    70: 0.03,
    60: 0.04,
    50: 0.06,
    40: 0.08,
    30: 0.1,
    20: 0.2,
    10: 0.5
}

Then lets throw that into the run_benchmark function to be able to set a precise amount of usage per core.

def run_benchmark(cpu_core=0, timeout=60, usage=100):
    start_time = time.perf_counter()
    process_id = win32process.GetCurrentProcess()
    win32process.SetProcessAffinityMask(process_id, 2 ** cpu_core)

    for num in count():
        if num % 100_000 == 0.0:
            if time.perf_counter() > start_time + timeout:
                return
            if usage != 100:
                time.sleep(usage_to_sleep_time[usage])
        math.sqrt(num)

Then if you have enough cores, you can see them all in action at the same time.

arguments = [(core, 100, usage) for core, usage in enumerate(usage_to_sleep_time)]

(I was impatient and didn’t do this one while the computer was idle, hence the messy lower ones.)

From here I am sure some of you could go crazy making individual cores ramp up and done, and create something truly spectacular, but my whistle was wetted. It is totally possible to have control over exactly how much CPU core usage is being shown in Task Manager with Python.

Here is the full final code, hope you enjoyed!

from itertools import count
import math
import time
from multiprocessing.pool import Pool
from multiprocessing import cpu_count

import win32process  # pywin32

usage_to_sleep_time = {
    100: 0,
    90: 0.014,
    80: 0.02,
    70: 0.03,
    60: 0.04,
    50: 0.06,
    40: 0.08,
    30: 0.1,
    20: 0.2,
    10: 0.5
}

def run_benchmark(cpu_core=0, timeout=60, usage=100):
    start_time = time.perf_counter()
    process_id = win32process.GetCurrentProcess()
    win32process.SetProcessAffinityMask(process_id, 2 ** cpu_core)

    for num in count():
        if num % 100_000 == 0.0:
            if time.perf_counter() > start_time + timeout:
                return num
            if usage != 100:
                time.sleep(usage_to_sleep_time[usage])
        math.sqrt(num)



if __name__ == '__main__':
    arguments = [(core, 100, usage) for core, usage in enumerate(usage_to_sleep_time)]

    with Pool(processes=len(arguments)) as p:
        print(p.starmap(run_benchmark, arguments))

A Raspberry Pi Streaming Camera using MPEG-DASH, HLS or RTSP

This is an improvement on my previous article, Raspberry Pi Hardware Accelerated RTSP Camera, now with the option of using more modern technology, MPEG-DASH and HLS!

First off, if you don’t care about the technicalities and just want a script to do everything for you, here you go! If you’re still interested in how it all works or want to tweak the settings, read on.

This article will walk you through how to either copy or convert video from your webcam or pi camera, set it up as a systemd service, and finally view it on a webpage or access it remotely.

MPEG-DASH vs HLS vs RSTP

So to clear this up first of all, these are “containers” that wrap around the actual video, which is a particular “codec” (such as h264). DASH and RTSP are fully codec agnostic, meaning they are capable of wrapping around any type of video codec. The big gotcha is what type of videos the viewer supports (and in RTSP’s case the middleman server as well.)

So if DASH and RTSP can handle everything, why even bother with HLS? Long story short, Apple, who developed HLS, is a bully, so they don’t support the open MPEG-DASH on their devices. Meaning if you are trying to share these video streams with the public or view on an Apple device, you will get the most compatibility with HLS.

So now the difference really comes down to how DASH/HLS are HTTP based protocols that can easily be supported in browser. This makes it super easy to set up an all-in-one device that can host it’s own webpage to view a video at.

Whereas RTSP requires additional software, such as VLC or a security system to view it. The real advantage with RTSP is the fact it really is nearly “real time” compared to DASH/HLS. Using my Raspberry Pis DASH/HLS seem to have a 10~20 second delay, compared to about 1 second for RTSP. As I already went over how to set that up, I won’t repeat it here and only go over DASH. But I personally use RSTP for my own home setup still.

Setting up required software

The two programs you will need are a file server (nginx, apache, python -m http.server, etc…) to host the DASH/HLS content and ffmpeg. And you don’t have to hand compile either!

Super short version:

sudo apt install nginx ffmpeg -y

To better understand why we need them and how to test them to make sure they are working properly, read on. Otherwise, skip ahead to “Gather Camera Details”.

File Server

Last time we use RTSP which required a special service of it’s own. Now we are using HLS and MPEG-DASH, which produce manifest and accompany stream files on the local system. For example, MPEG-DASH will create a manifest.mpd file that contains links to *.m4s files in the same directory which are the chunked up video files.

That means if we make those files accessible remotely, we can use standard HTTP to transport the video. Hence the need for a basic file server. I personally use nginx for the final setup, as it’s fast, easy to use, and has defaults we can use out of the box. So lets install it!

 sudo apt install nginx -y

Now all you need to do is open up a web browser on another computer on that network and connect to http://raspberrypi (If you changed hostname, or having trouble connecting, run hostname -I to see it’s IP address and use http://<ip_address> instead.) You should see a simple webpage that says “Welcome to nginx!”

FFmpeg

Since the last article came out, FFmpeg has finally started shipping with hardware acceleration built in! If you still want to compile in some custom libraries or try and optimize it for your needs, check out my Raspberry Pi FFmpeg compile guide. Otherwise, just download it from the distribution repositories.

sudo apt install ffmpeg -y

You can verify it’s part of the package by checking the encoders for h264_omx.

ffmpeg -hide_banner -encoders | grep omx

Should produce V….. h264_omx OpenMAX IL H.264 video encoder (codec h264) or similar. If for some reason it doesn’t have that or other libraries you are looking for, such as the popular fdk-aac, look into my article onto compiling FFmpeg yourself, or use the helper script with the option --compile-ffmpeg.

Gather camera details

If you have the helper script, simply run it with the option --camera-details and it will print out each device and their formats with their highest resolution for each.

python3 streaming_setup.py --camera-details
# /dev/video0: {'yuyv422': '1280x800', 'mjpeg': '1280x720'}

Under the hood, this is running the following command for every device found.

ffmpeg -hide_banner -f video4linux2 -list_formats all -i /dev/video0

To also see what frame rates are supported per resolution, you will have to run the v4l2-ctl command for that device.

v4l2-ctl -d /dev/video0 --list-formats-ext

Create the FFmpeg command

The FFmpeg command is particular about order when talking about input and output details. Ours will be broken down into the following blocks:

ffmpeg <incoming video details> -i <device> <conversion details> <output>

So let’s say you are using a raspberry pi camera and want to stream 1080p video without re-encoding it. We first have to tell FFmpeg about the camera details it will pull from.

Applying the Camera Details

In longhand, it would look like this:

-input_format h264 -f video4linux2 -video_size 1920x1080 -framerate 30 -i /dev/video0

Hopefully each of those parts are pretty self explanatory. We can also reduce -video_size, aka the incoming resolution to -s, and -framerate, aka the fps to -r.

Network Bandwidth Considerations

Internet streamers, beware you may not be able to upload directly from the camera’s full 1080p at 30fps. I did a quick test using vnstat over a wired connection with a Pi Zero, and found my 5MP OV5647 camera was using almost 20Mbit/s. Keep in mind the official Pi Camera with the sony sensor is 8MP so may be even higher than that.

The following tests were done at two minute averages while the stream was being watched. The averages were recorded, and generally the peaks were 2x the average.

ResolutionfpsAverage Mbit/s
1920×10803020.12
1920×1080155.12
1280×7206015.81
1280×720303.94
1280×720152.75
640×480905.66
640×480604.43
640×480300.93
640×480150.43
Using a Pi Zero with 5MP OV5647 camera over ethernet

Conversion options

Since this already is h264 we don’t need anything other than to say copy the incoming stream. So that option is -codec:v copy or shorthand -c:v copy. Which is saying set the codec of v for video tracks to copy aka don’t convert.

-c:v copy

If you instead had a webcam that only supported mjpeg input, or if you needed to add text overlay to the video, you would have to recompile FFmpeg. With the Raspberry Pi, you’ll want to use the built in hardware encoder, h264_omx. You would then also have to set the bitrate (-b:v) of the outgoing video. That is really camera / network dependent, but my rule of thumb is use video width x hight x 2. So 1920x1080x2 ==4,147,200, so I would set the bitrate to 4M (aka ~4000kb, or ~4000000 bytes).

-c:v h264_omx -b:v 4M

The Raspberry Pi OpenMAX (omx) hardware encoder has very limited options, and doesn’t support constant quality or rate factors like libx264 does. So the only way to adjust quality is with the bitrate. As for general quality, it sits between libx264s ultrafast and superfast presets, which is somewhat disappointing but not surprising for a real-time hardware encoder.

MPEG-DASH and HLS output

Personally I would never recommend HLS to a friend, as MPEG-DASH is all around a more open and powerful muxer. But I understand some legacy systems don’t have DASH support yet. Thankfully, FFmpeg’s dash module gives us HLS for free! (Note that some systems don’t even support that, and you may end up having to use only the hls muxer.)

DASH and HLS both crate playlist files locally, with chucked up video files beside them. This creates a few problems, first is the cleanup and management of those files. Thankfully, the DASH model has options to delete all those files on exit, as well as the ability to only keep so many video chunks on disk at a time.

The bigger problem is the constant writing to the disk. In this case an SD card that is a wear item and has higher error rates with the more writes it experiences. So to save the SD card, and ourselves future headaches, we are going to write these files to memory instead!

sudo mkdir -p /dev/shm/streaming/

Tada, we now have a folder in shared memory space we can use. The caveat is it will be removed after restarts, so we will have to make sure it’s recreated before or FFmpeg service is started. But lets not get ahead of ourselves. We just need to know the rest of our FFmpeg command.

-f dash -window_size 10 -remove_at_exit 1 -hls_playlist 1 /dev/shm/streaming/manifest.mpd

We are using just a few options of what FFmpeg’s DASH muxer can do if you do need further customization, but I doubt it for most cases.

I am setting the max number of video chuncks to be kept at 10 via -window_size and telling FFmpeg to delete them and the manifest file when it stops running with -remove_at_exit 1. Then we enable HLS with -hls_playlist 1 which creates a master.m3u8 file in the same directory as the manifest.mpd (Feel free to disable HLS if you don’t need it.)

Putting it all together

If you have that camera with native h264 encoding, like the Pi Camera, here is your copy and paste code!

# sudo mkdir -p /dev/shm/streaming/
sudo ffmpeg -input_format h264 -f video4linux2 -video_size 1920x1080 -framerate 30 -i /dev/video0 -c:v copy -f dash -window_size 10 -remove_at_exit 1 -hls_playlist 1 /dev/shm/streaming/manifest.mpd

You should soon start seeing messages about the manifest and chucks being updated and the current frame rate.

[dash @ 0x20bff00] Opening '/dev/shm/streaming/manifest.mpd.tmp' for writing
[dash @ 0x20bff00] Opening '/dev/shm/streaming/media_0.m3u8.tmp' for writing
[dash @ 0x20bff00] Opening '/dev/shm/streaming/chunk-stream0-00003.m4s.tmp' for writing
[dash @ 0x20bff00] Opening '/dev/shm/streaming/manifest.mpd.tmp' for writing
[dash @ 0x20bff00] Opening '/dev/shm/streaming/media_0.m3u8.tmp' for writing
[dash @ 0x20bff00] Opening '/dev/shm/streaming/chunk-stream0-00004.m4s.tmp' for writing
frame=  631 fps= 30 q=-1.0 size=N/A time=00:00:20.95 bitrate=N/A speed=1.01x

If you are showing errors like Operation not permitted or Cannot find a proper format please check your input formats and try lower resolutions. Sometimes cameras list their photo taking resolutions which are much higher than their streaming resolutions. If you are still receiving the errors even with the right codec selected, turn the Pi off, check the connections to the camera and turn it back on, as the camera can sometimes get in a bad state or have a loose wire.

To add audio or a text overlay like a timestamp please refer to those linked sections of my previous guide!

Setting up Remote Viewing

First we need to allow nginx to serve up that manifest file. By default nginx is serving up the /var/www/html directory. So it is easy enough to link our in memory folder as a sub folder there.

ln -s /dev/shm/streaming /var/www/html/streaming

Then we need to either have a way to view it via a webpage, or connect to it with a remote player such as VLC. If you have VLC or a viewer for DASH content, you can point it at http://raspberrypi/streaming/manifest.mpd and should start seeing the stream! (If you have a custom hostname or want to use IP, can use hostname -I command to use that in place of raspberrypi). To create a webpage to view the content, we will have to put it in a folder that won’t be deleted on reboot.

I personally chose /var/lib/streaming/index.html as I will also be putting a script in there that will help up set things up again each reboot. Make sure to create the directory first:

mkdir -p /var/lib/streaming

So open up your favorite text editor and copy the following html code into /var/lib/streaming/index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Raspberry Pi Camera</title>
    <style>
        html body .page {
            height: 100%;
            width: 100%;
        }

        video {
            width: 800px;
        }

        .wrapper {
            width: 800px;
            margin: auto;
        }
    </style>
</head>
<body>
<div class="page">
    <div class="wrapper">
        <h1> Raspberry Pi Camera</h1>
        <video data-dashjs-player autoplay controls type="application/dash+xml"></video>
    </div>
</div>
<script src="https://cdn.dashjs.org/latest/dash.all.min.js"></script>
<script>
    var player = init();

    function init() {
        var video = document.querySelector("video");
        player = dashjs.MediaPlayer().create();
        player.initialize(video, "manifest.mpd", true);
        player.updateSettings({
            'streaming': {
                'lowLatencyEnabled': true,
                'liveDelay': 1,
                'liveCatchUpMinDrift': 0.05,
                'liveCatchUpPlaybackRate': 0.5
            }
        });
        return player;
    }
</script>
</body>
</html>

Now lets link it up to the nginx directory.

ln -s /var/lib/streaming/index.html /var/www/html/streaming/index.html

Now you should be able to view your streaming camera webpage at http://raspberrypi/streaming!

We are using the very expansive dash.js open source library that has a lot of customization options. We are using a few basic ones here to set it up to be better at live streaming, but please check out their project and see how best to tweak it for your needs.

Reboot Script

Now that we have a sexy webpage and working ffmpeg command, we need to save ’em and make sure they can survive reboots. Therefor, we need to create a script that will run on restart to recreate the folder in memory and copy the index file over. I put mine right beside the permanent index file, with /var/lib/streaming/setup_streaming.sh add the following text.

# /var/lib/streaming/setup_streaming.sh
mkdir -p /dev/shm/streaming
if [ ! -e /var/www/html/streaming ]; then
    ln -s  /dev/shm/streaming /var/www/html/streaming
fi 
if [ ! -e /var/www/html/streaming/index.html ]; then
    ln -s /var/lib/streaming/index.html /var/www/html/streaming/index.html
fi 

Don’t forget to make it executable.

chmod +x /var/lib/streaming/setup_streaming.sh

Now to run it on restart, we are going to add this script to /etc/rc.local. Open the /etc/rc.local file and add these lines before the exit 0 at the bottom:

# Streaming Shared Memory Setup
if [ -f /var/lib/streaming/setup_streaming.sh ]; then
    /bin/bash /var/lib/streaming/setup_streaming.sh|| true
fi

Notice we are being extra extra careful to not throw errors here, as at the top of the rc.local file it makes it clear that it should never exit without a clean exit code of 0.

Camera Streaming Service

After we have the location in memory setup, we can start the camera. I chose to do so via a systemd service, so it can restart on errors and easy to manage. In this example it’s called stream_camera but you can change the actual service file name to suit your fancy.

Add a new file at /etc/systemd/system/stream_camera.service.

# /etc/systemd/system/stream_camera.service
[Unit]
Description=Camera Streaming Service
After=network.target rc-local.service

[Service]
Restart=always
RestartSec=20s
ExecStart=ffmpeg -input_format h264 -f video4linux2 -video_size 1920x1080 -framerate 30 -i /dev/video0 -c:v copy -f dash -window_size 10 -remove_at_exit 1 -hls_playlist 1 /dev/shm/streaming/manifest.mpd

[Install]
WantedBy=multi-user.target

Notice we set it to be run after rc-local to make sure we are ready for ffmpeg to write to /dev/shm/streaming.