Encoding UHD 4K HDR10 and HDR10+ Videos

tl;dr: If you don’t want to do the work by hand on the command line, use FastFlix. If you have any issues, reach out on discord or open a github issue.

I talked about this before with my encoding setting for handbrake post, but there is was a fundamental flaw using Handbrake for HDR 10-bit video….it only has had a 8-bit internal pipeline! It and most other GUIs don’t yet support dynamic metadata, such as HDR10+ or Dolby Vision though.

2021-02 Update: Handbrake’s latest code has HDR10 static metadata support.

Thankfully, you can avoid some hassle and save HDR10 or HDR10+ by using FastFlix instead, or directly use FFmpeg. (To learn about extracting and converting with HDR10+ or saving Dolby Vision by remuxing, skip ahead.) If you want to do the work yourself, here are the two most basic commands you need to save your juicy HDR10 data. This will use the Dolby Vision (profile 8.1) Glass Blowing demo.

Extract the Mastering Display metadata

First, we need to use FFprobe to extract the Mastering Display and Content Light Level metadata. We are going to tell it to only read the first frame’s metadata -read_intervals "%+#1" for the file GlassBlowingUHD.mp4

If you don’t see metadata in the output, you may need to select a specific stream.

ffprobe -hide_banner -loglevel warning -select_streams v -print_format json -show_frames -read_intervals "%+#1" -show_entries "frame=color_space,color_primaries,color_transfer,side_data_list,pix_fmt" -i GlassBlowingUHD.mp4

A quick breakdown of what we are sending ffprobe:

  • -hide_banner -loglevel warning Don’t display what we don’t need
  • -select_streams v We only want the details for the video (v) stream (or use a specific stream)
  • -print_format json Make it easier to parse
  • -read_intervals "%+#1" Only grab data from the first frame
  • -show_entries ... Pick only the relevant data we want
  • -i GlassBlowingUHD.mp4 input (-i) is our Dobly Vision demo file

That will output something like this:

{ "frames": [
        {
            "pix_fmt": "yuv420p10le",
            "color_space": "bt2020nc",
            "color_primaries": "bt2020",
            "color_transfer": "smpte2084",
            "side_data_list": [
                {
                    "side_data_type": "Mastering display metadata",
                    "red_x": "35400/50000",
                    "red_y": "14600/50000",
                    "green_x": "8500/50000",
                    "green_y": "39850/50000",
                    "blue_x": "6550/50000",
                    "blue_y": "2300/50000",
                    "white_point_x": "15635/50000",
                    "white_point_y": "16450/50000",
                    "min_luminance": "50/10000",
                    "max_luminance": "40000000/10000"
                },
                {
                    "side_data_type": "Content light level metadata",
                    "max_content": 0,
                    "max_average": 0
} ] } ] }

I chose to output it with json via the -print_format json option to make it more machine parsible, but you can omit that if you just want the text.

We are now going to take all that data, and break it down into groups of <color abbreviation>(<x>, <y>) while leaving off the right side of the in most cases*, so for example we combine red_x "35400/50000"and red_y "14600/50000" into R(35400,14600).

G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(40000000, 50)

*If your data for colors is not divided by /50000 or luminescence not divided by 10000 and have been simplified, you will have to expand it back out to the full ratio. For example if yours lists 'red_x': '17/25', 'red_y': '8/25' you will have to divide 50000 by the current denominator (25) to get the ratio (2000) and multiply that by the numerator (17 and 8) to get the proper R(34000,16000).

This data, as well as the Content light level <max_content>,<max_average> of 0,0 will be fed into the encoder command options.

Convert the video

This command converts only the video, keeping the HDR10 intact. We will have to pass these arguments not to ffmpeg, but to the x265 encoder directly via the -x265-params option. (If you’re not familiar with FFmpeg, don’t fret. FastFlix, which I talk about later, will do the work for you!)

ffmpeg  -i GlassBlowingUHD.mp4 -map 0 -c:v libx265 -x265-params hdr-opt=1:repeat-headers=1:colorprim=bt2020:transfer=smpte2084:colormatrix=bt2020nc:master-display=G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(40000000,50):max-cll=0,0 -crf 20 -preset veryfast -pix_fmt yuv420p10le GlassBlowingConverted.mkv

Let’s break down what we are throwing into the x265-params:

  • hdr-opt=1 we are telling it yes, we will be using HDR
  • repeat-headers=1we want these headers on every frame as required
  • colorprim, transfer and colormatrix the same as ffprobe listed
  • master-display this is where we add our color string from above
  • max-cll Content light level data, in our case 0,0

During a conversion like this, when a Dolby Vision layer exists, you will see a lot of messages like [hevc @ 000001f93ece2e00] Skipping NAL unit 62 because there is an entire layer that ffmpeg does not yet know how to decode.

For the quality of the conversion, I was setting it to -crf 20 with a -preset veryfast to convert it quickly without a lot of quality loss. I dig deeper into how FFmpeg handles crf vs preset with regards to quality below.

All sound and data and everything will be copied over thanks to the -map 0 option, that is a blanket statement of “copy everything from the first (0 start index) input stream”.

That is really all you need to know for the basics of how to encode your video and save the HDR10 data!

FFmpeg conversion settings

I covered this a bit before in the other post, but I wanted to go through the full gauntlet of presets and crfs one might feel inclined to use. I compared each encoding with the original using VMAF and SSIM calculations over a 11 second clip. Then, I created over a 100 conversions for this single chart, so it is a little cramped:

First takeaways are that there is no real difference between veryslow and slower, nor between veryfast and faster, as their lines are drawn on top of each other. The same is true for both VMAF and SSIM scores.

Second, no one in their right mind would ever keep a recording stored by using ultrafast. That is purely for real time streaming use.

Now for VMAF scores, 5~6 points away from source is visually distinguishable when watching. In other words it will have very noticeable artifacts. Personally I can tell on my screen with just a single digit difference, and some people are even more sensitive, so this is by no means an exact tell all. At minimum, lets zoom in a bit and get rid of anything that will produce video with very noticeable artifacts.

From these chart, it seems clear that there is obviously no reason whatsoever to ever use anything other than slow. Which I personally do for anything I am encoding. However, slow lives up to its namesake.

Encoding Speed and Bitrate

I had to trim off veryslow and slower from the main chart to be able to even see the rest, and slow is still almost three times slower than medium. All the charts contain the same data, just with some of the longer running presets removed from each to better see details of the faster presets.

Please note, the first three crf datapoints are little dirty, as the system was in use for the first three tests. However, there is enough clean data to see how that compares down the line.

To see a clearer picture of how long it takes for each of the presets, I will exclude those first three times, and average the remaining data. The data is then compared against the medium (default) present and the original clip length of eleven seconds.

PresetTimevs “medium”vs clip length (11s)
ultrafast11.2043.370x0.982x
superfast12.1753.101x0.903x
veryfast19.1391.973x0.575x
faster19.1691.970x0.574x
fast22.7921.657x0.482x
medium37.7641.000x0.291x
slow97.7550.386x0.112x
slower315.9000.120x0.035x
veryslow574.5800.066x0.019x

What is a little scary here is that even with “ultrafast” preset we are not able to get realtime conversion, and these tests were run on a fairly high powered system wielding an i9-9900k! While it might be clear from the crf graph that slow is the clear winner, unless you have a beefy computer, it may be a non-option.

Use the slowest preset that you have patience for

FFmpeg encoding guide

Also unlike VBR encoding, the average bitrate and filesize using crf will wildly differ based upon different source material. This next chart is just showing off the basic curve effect you will see, however it cannot be compared to what you may expect to see with your file.

The two big jumps are between slow and medium as well as veryfast and superfast. That is interesting because while slow and medium are quite far apart on the VMAF comparison, veryfast and superfast are not. I expected a much larger dip from superfast to ultrafast but was wrong.

FastFlix, doing the heavy lifting for you!

I have written a GUI program, FastFlix, around FFmpeg and other tools to convert videos easily to HEVC, AV1 and other formats. While I won’t promise it will provide everything you are looking for, it will do the work for you of extracting the HDR10 details of a video and passing them into a FFmpeg command. FastFlix can even handle HDR10+ metadata! It also has a panel that shows you exactly the command(s) it is about to run, so you could copy it and modify it to your hearts content!

If you have any problems with it please help by raising an issue!

Extracting and encoding with HDR10+ metadata

First off, this is not for the faint of heart. Thankfully the newest FFmpeg builds for Windows now support HDR10+ metadata files by default, so this process has become a lot easier. Here is a quick overview how to do it, also a huge shutout to “Frank” in the comments below for this tool out to me!

You will have to download a copy of hdr10plus_tool from quietviod’s repo.

Check to make sure your video has HDR10+ information it can read.

ffmpeg -loglevel panic -i input.mkv -c:v copy -vbsf hevc_mp4toannexb -f hevc - | hdr10plus_tool --verify -

It should produce a nice message stating there is HDR10+ metadata.

Parsing HEVC file for dynamic metadata…
Dynamic HDR10+ metadata detected.

Once you confirmed it exists, extract it to a json file

ffmpeg -i input.mkv -c:v copy -vbsf hevc_mp4toannexb -f hevc - | hdr10plus_tool -o metadata.json -

Option 1: Using the newest FFmpeg

You just need to pass the metadata file via the dhdr10-info option in the x265-params. And don’t forget to add your audio and subtitle settings!

ffmpeg.exe -i input.mkv -c:v libx265 -pix_fmt yuv420p10le -x265-params "colorprim=bt2020:transfer=smpte2084:colormatrix=bt2020nc:master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1):max-cll=1016,115:hdr10=1:dhdr10-info=metadata.json" -crf 20 -preset medium "output.mkv"

Option 2: (original article) Using custom compiled x265

Now the painful part you’ll have to work on a bit yourself. To use the metadata file, you will need to custom compiled x265 with the cmake option “HDR10_PLUS” . Otherwise when you try to convert with it, you’ll see a message like “x265 [warning]: –dhdr10-info disabled. Enable HDR10_PLUS in cmake.” in the the output, but it will still encode, just without HDR10+ support.

Once that is compiled, you will have to use x265 as part of your conversion pipeline. Use x265 to convert your video with the HDR10+ metadata and other details you discovered earlier.

ffmpeg -loglevel panic -y -i input.mkv -to 30.0 -f yuv4mpegpipe -strict -1 - | x265 - --y4m --crf=20 --repeat-headers --hdr10 --colorprim=bt2020 --transfer=smpte2084 --colormatrix=bt2020nc --master-display="G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)" --max-cll="1016,115" -D 10 --dhdr10-info=metadata.json output.hevc

Then you will have to use that output.hevc file with ffmpeg to combine it with your audio conversions and subtitles and so on to repackage it.

Saving Dolby Vision

Thanks to Jesse Robinson in the comments, it seems it now may be possible to extract and convert a video with Dolby Vision without the original RPU file. Like the original HDR10+ section, you will need the x265 executable directly, as it does not support that option through FFmpeg at all.

Note I say “saving” and not “converting”, because unless you have the original RPU file for the DV to pass to x265, you’re out of luck as of now and cannot convert the video. The x265 encoder is able to take a RPU file and create a Dolby Vision ready movie. This does require you to have the RPU file for the video.

2022 Update: NVEncC also takes RPU files, and is easier to work with if you have a Nvidia graphics card.

There are a few extra caveats when converting a Dolby Vision file with x265. First you need to use the appropriate 10 or 12 bit version. Second, change pix_fmt, input-depth and output-depth as required. Finally you MUST provide a dobly-vision-profile level for it to read the RPU as well as enable VBV by providing vbv-bufsize and vbv-maxrate (read the x265 command line options to see what settings are best for you.)

ffmpeg -i GlassBlowing.mp4 -f yuv4mpegpipe -strict -1 -pix_fmt yuv420p10le - | x265-10b - --input-depth 10 --output-depth 10 --y4m --preset veryfast --crf 22 --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(40000000,50)" --max-cll "0,0" --colormatrix bt2020nc --colorprim bt2020 --transfer smpte2084 --dolby-vision-rpu glass.rpu --dolby-vision-profile 8.1 --vbv-bufsize 20000 --vbv-maxrate 20000 glass_dv.hevc

Once the glass_dv.hevc file is created, you will need to use tsMuxerR ( this version is recommended by some, and the newest nightly is linked below) to put it back into a container as well as add the audio back.

Currently trying to add the glass_dv.hevc back into a container with FFmpeg or MP4Box will break the Dolby Vision. Hence the need for the special Muxer. FFmpeg 5.0+ supports remuxing with Dolby Vision. You can do with with the “Copy” function in FastFlix or by hand with ffmpeg on command line and -c:v copy for the video track. Make sure you’re using at minimum 5.0 or it won’t work! This will result in video that has BL+RPU Dolby Vision.

It is possible to also only convert the audio and change around the streams with remuxers. For example tsMuxeR (nightly build, not default download) is popular to be able to take mkv files that most TVs won’t recognize HDR in, and remux them into into ts files so they do. If you also have TrueHD sound tracks, you may need to use eac3to first to break it into the TrueHD and AC3 Core tracks before muxing.

Easily Viewing HDR / Video information

Another helpful program to quickly view what type of HDR a video has is MediaInfo. For example here is the original Dolby Vision Glass Blowing video info (some trimmed):

Video
ID                                       : 1
Format                                   : HEVC
Format/Info                              : High Efficiency Video Coding
Format profile                           : Main 10@L5.1@Main
HDR format                               : Dolby Vision, Version 1.0, dvhe.08.09, BL+RPU, HDR10 compatible / SMPTE ST 2086, HDR10 compatible
Codec ID                                 : hev1
Color space                              : YUV
Chroma subsampling                       : 4:2:0 (Type 2)
Bit depth                                : 10 bits
Color range                              : Limited
Color primaries                          : BT.2020
Transfer characteristics                 : PQ
Matrix coefficients                      : BT.2020 non-constant
Mastering display color primaries        : BT.2020
Mastering display luminance              : min: 0.0050 cd/m2, max: 4000 cd/m2
Codec configuration box                  : hvcC+dvvC

And here it is after conversion:

Video
ID                                       : 1
Format                                   : HEVC
Format/Info                              : High Efficiency Video Coding
Format profile                           : Main 10@L5.1@Main
HDR format                               : SMPTE ST 2086, HDR10 compatible
Codec ID                                 : V_MPEGH/ISO/HEVC
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 10 bits
Color range                              : Limited
Color primaries                          : BT.2020
Transfer characteristics                 : PQ
Matrix coefficients                      : BT.2020 non-constant
Mastering display color primaries        : BT.2020
Mastering display luminance              : min: 0.0050 cd/m2, max: 4000 cd/m2

Notice we have lost the Dolby Vision, BL+RPU information but at least we retained the HDR10 data, which Handbrake can’t do!

That’s a wrap!

Hope you found this information useful, and please feel free to leave a comment for feedback, suggestions or questions!

Until next time, stay safe and love each other!


Appendix: Using a specific stream in the FFprobe command (Not seeing HDR data)

You may have a video with covers or even a second video track, and that could be causing the missing info. First run:

ffmpeg -i <video_name>

It will return a list of streams, for example this one has 4 streams, two of which are considered “video” streams even though the second is an image.

  Stream #0:0(eng): Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x2160 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn
    Metadata:
      BPS-eng         : 77574554
      DURATION-eng    : 01:58:18.257833333
      NUMBER_OF_FRAMES-eng: 170188
      NUMBER_OF_BYTES-eng: 68830515604
      SOURCE_ID-eng   : 001011
    Side data:
      DOVI configuration record: version: 1.0, profile: 7, level: 6, rpu flag: 1, el flag: 1, bl flag: 1, compatibility id: 6
  Stream #0:1(eng): Audio: truehd (Dolby TrueHD + Dolby Atmos), 48000 Hz, 7.1, s32 (24 bit) (default)
    Metadata:
      title           : Surround 7.1
      BPS-eng         : 4609091
      DURATION-eng    : 01:58:18.258333333
      NUMBER_OF_FRAMES-eng: 8517910
      NUMBER_OF_BYTES-eng: 4089565476
      SOURCE_ID-eng   : 001100
  Stream #0:2(eng): Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
    Metadata:
      title           : Surround 5.1
      BPS-eng         : 448000
      DURATION-eng    : 01:58:18.272000000
      NUMBER_OF_FRAMES-eng: 221821
      NUMBER_OF_BYTES-eng: 397503232
      SOURCE_ID-eng   : 001100
  Stream #0:3(eng): Subtitle: hdmv_pgs_subtitle
    Metadata:
      BPS-eng         : 44628
      DURATION-eng    : 01:50:33.939812500
      NUMBER_OF_FRAMES-eng: 3576
      NUMBER_OF_BYTES-eng: 37007803
      SOURCE_ID-eng   : 0012A0
  Stream #0:4: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 640x360 [SAR 96:96 DAR 16:9], 90k tbr, 90k tbn (attached pic)
    Metadata:
      filename        : cover.jpg
      mimetype        : image/jpeg

If I ran the same command as above with -select_streams v it would return

    "frames": [
        {
            "pix_fmt": "yuvj420p",
            "color_space": "bt470bg"
        }
    ]
}

So instead, find the number of the track you want to use from the output and change it in the command like call. When the output shows Stream #0:2 for example, that means the first input specified on the command line, as you can open multiple files at once with FFprobe, and the 3rd stream (because it starts at 0). In my case, I want to specify 0 as that is the main video track Stream #0:0(eng): Video: hevc.

ffprobe -hide_banner -loglevel warning -select_streams 0 -print_format json -show_frames -read_intervals "%+#1" -show_entries "frame=color_space,color_primaries,color_transfer,side_data_list,pix_fmt" -i <video_name>