x264

Raspberry Pi Hardware Encoding Speed Test

The GPU hardware encoder in the Raspberry Pi can greatly speed up encoding for H.264 videos. It is perfect to use for transcoding live streams as well. It can be accessed in FFmpeg with the h264_omx encoder. But is it fast enough for live stream a 1080p webcam?

You might have already seen a lot of people using the built-in raspberry pi cameras to stream crisp 1080p video, so why is this even a question? Well the catch there is the Pi Camera itself supports native H.264 encoding. Some webcams do as well, and they are honestly the best choice to use rather than constantly battering the GPU encoder if you don’t need to.

However, you may just happen to have an old cheap webcam that only does MJPEG streams. Those streams are generally too large to pump over the Raspberry Pi’s wifi at full fps. Would using the hardware encoder help you?

The Results First

This is why you’re here, let’s cut to the chase and do a comparison of the two latest Raspberry Pi’s available, the Pi 4 B, and Pi 3 B+ (we’ll throw in the little Pi Zero Wireless for fun too.) We’ll talk about the two videos used later, but suffice to say, Trackday is easier to encode and closer to what an average Webcam would produce. Artist is more of a torture test.

Boom! The Raspberry Pi 4 B is right in the butter zone. Most webcams that are 30fps would be handled just fine with the Pi 4 (depending on the quality of sensor and what you’re filming). The Pi 3 B+ isn’t terrible, but wouldn’t be able to encode a realtime stream smoothly.

The little Pi Zero? Well, it did its best and we’re proud of it!

Test Media

Trackday

The first video I used was a video captured from a car on a racetrack. It is 1920×1080 at 30fps captured from a dash cam.

10 second preview of 2 minute video – Jaguar F-Type R at Harris Hills Raceway

The original bitrate was a 10.5MB/s and was cut down to 5MB/s with all our encodes.

The command used is:

ffmpeg -i trackday.mp4 -c:v h264_omx -b:v 5M -an -sn -dn track_omx.mp4

Artist

The second file, artwork in progress by Clara Griffith, is also 1920×1080 at 30fps. However it is using BT.709 color space and started out at 35MB/s!

Artwork of Clara Griffith – https://www.claragriffith.com/

If you see a webcam that advertises as “HDR” it is most likely using the BT.709 color space as well, and may give your Pi a headache.

This one was also compressed down to only 5MB/s. Why 5MB/s you ask? Well as it turns out, using the standard 2.4GHz wifi band, the Pi 3 and Pi 4 can each sustain about 6.5MB/s download speed over my wireless. That means I know these videos could be played smoothly over wifi. The Pi Zero W on the other hand could only sustain around 3MB/s wifi transfer speed.

All three systems were set up to use 256MB of GPU ram.

Video Quality

This actually took me by surprise to be honest. The quality of the encode is quite good when comparing to what a software encoder could do. I didn’t pull any punches either, the x264 encoder was set to dual pass and using veryslow preset with the film tune set. x264 commands:

ffmpeg -i "artist.mkv" -map 0:0 -c:v libx264 -pix_fmt yuv420p -tune:v film -color_primaries bt709 -color_trc bt709 -colorspace bt709  -pass 1 -passlogfile "pass_log_file_f9e11f23efaa23591fa8" -b:v 5000k -preset:v veryslow  -an -sn -dn -f mp4 /dev/null

ffmpeg -i "artist.mkv" -map 0:0 -c:v libx264 -pix_fmt yuv420p -tune:v film -color_primaries bt709 -color_trc bt709 -colorspace bt709  -pass 2 -passlogfile "pass_log_file_f9e11f23efaa23591fa8" -b:v 5000k -preset:v veryslow -map_metadata -1 -map_chapters 0  "artist-x264-5M-veryslow-film.mkv"

Of the two videos, Trackday is more realistic to what a webcam would experience and both encoders are near equal. So why was the Artist video so much better quality after encode, even though it started out with a lot higher bitrate? My informed guess on that is how crisp the original was, as well as the content is slow moving enough, the H.264 was able to reuse larger parts of the video for subsequent frames.

That means the software encoder x264 wins by virtue of being able to effectively use B-frames. Whereas the OMX hardware encoder doesn’t have support for B-frames. Therefor the Pi is on even ground when B-frames aren’t effective, but lags behind when they come into play.

A Note on Pi Camera Native H.264

I have found very little information about what Pi Cameras actually support H.264 natively. I only have “knock off” Raspberry Pi cameras that use the ribbon cable. They all support H.264 streams, which you can check with:

v4l2-ctl -d /dev/video0 --list-formats-ext

# ...
# [4]: 'H264' (H.264, compressed)
#                Size: Stepwise 32x32 - 2592x1944 with step 2/2
# [5]: 'MJPG' (Motion-JPEG, compressed)
#                Size: Stepwise 32x32 - 2592x1944 with step 2/2

or

ffmpeg -hide_banner -f video4linux2 -list_formats all -i /dev/video0

# [video4linux2,v4l2 @ 0x22c9d70] Raw       :     yuv420p :     Planar YUV 4:2:0 : {32-2592, 2}x{32-1944, 2}
# [video4linux2,v4l2 @ 0x22c9d70] Compressed:       mjpeg :            JFIF JPEG : {32-2592, 2}x{32-1944, 2}
# [video4linux2,v4l2 @ 0x22c9d70] Compressed:        h264 :                H.264 : {32-2592, 2}x{32-1944, 2}

I was kinda worried they were using some hackery to “pretend” to actually have native H.264 but instead using the GPU. However if the Pi Zero has anything to show, it has a really hard time encoding 1080p videos with the GPU encoder, so I do believe they have native support.

Wrap Up

If you already have:

A camera and a Raspberry Pi: you can get started streaming right away.

A 1080p webcam and want to stream from it: consider grabbing a Raspberry Pi 4.

The Raspberry Pi: first always try to grab a camera with built in H.264 support, otherwise, the Pi 4 should support most webcams using hardware accelerated encoding.

Encoding settings for HDR 4K videos using 10-bit x265

There is currently a serious lack of data on compressing 4K HDR videos out there, so I took it upon myself to get learned in the ways of the x265 encoding world.

First things first, this is NOT a guide for Dolby Vision or HDR10. This is simply for videos using the BT.2020 color primaries. Please read the new article for saving HDR.

I have historically been using the older x264 mp4s for my videos, as it just works on everything. However most devices finally have some native h.265 decoding. (As a heads up h.265 is the specification, and x265 is encoder for it. I may mix it up myself in this article, don’t worry about the letter, just the numbers.)

Updated: 6/29/2020 – Please refer to the new guide

Updated: 4/14/2019 – New Preset Setting (tl;dr: use slow)

What are the best settings for me to use when encoding x265 videos?

The honest to god true answer is “it depends”, however I find that answer unsuitable for my own needs. I want a setting that I can use on any incoming 4K HDR video I buy.

I mainly use Handbrake now use ffmpeg because I learned Handbrake only has a 8-bit internal pipeline. In the past, I went straight to Handbrake’s documentation. It states that for 4K videos with x265 they suggest a Constant Rate Factor (CRF) encoding in the range of 22-28 (the larger the number the lower the quality).

Through some experimentation I found that I personally never can really see a difference between anything lower than 22 using a Slow present. Therefore I played it safe, bump it down a notch and just encode all of my stuff with x265 10-bit at CRF of 20 on Slow preset. That way I know I should never be disappointed.

Then I recently read YouTubes suggest guidelines for bitrates. They claim that a 4K video coming into their site should optimally be 35~45Mbps when encoded with the older x264 codecs.

Now I know that x265 can be around 50% more efficient than x264, and that YouTube needs it higher quality coming in so when they re-compress it it will still look good. But when I looked at the videos I was enjoying just fine at CRF 22, they were mostly coming out with less than a 10Mbps bitrate. So I had to ask myself:

How much better is x265 than x264?

To find out I would need a lot of comparable data. I started with a 4K HDR example video. First thing I did was to chop out a minute segment and promptly remove the HDR. Thus comparing the two encoders via their default 8-bit compressors.

I found this code to convert the 10-bit “HDR” yuv420p10le colorspace down to the standard yuv420p 8-bit colorspace from the colourspace blog so props to them for having a handy guide just for this.

ffmpeg -y -ss 07:48 -t 60 -i my_movie.mkv-vf zscale=t=linear:npl=100,format=gbrpf32le,zscale=p=bt709,tonemap=tonemap=hable:desat=0,zscale=t=bt709:m=bt709:r=tv,format=yuv420p -c:v libx265 -preset ultrafast -x265-params lossless=1 -an -sn -dn -reset_timestamps 1 movie_non_hdr.mkv

Average Overall SSIM

Then I ran multiple two pass ABR runs using ffmpeg for both x264 and x265 using the same target bitrate. Afterwards compared them to the original using the Structural Similarity Index (SSIM). Put simply, the closer the result is to 1 the better. It means there is less differences between the original and the compressed one

Generated via Python and matplotlib
(Click to view larger version)

The SSIM result is done frame by frame, so we have to average them all together to see which is best overall. On the section of video I chose, x264 needed considerably more bitrate to achieve the same score. The horizontal line shows this where x264 needs 14Mbps to match x265’s 9Mbps, a 5000kbps difference! If we wanted to go by YouTube’s recommendations for a video file that will be re-encoded again, you would only need a 25Mbps x265 file instead of a 35Mbps x264 video.

Sample commands I used to generate these files:

ffmpeg -i movie.mkv -c:v libx265 -b:v 500k -x265-params pass=1 -an -f mp4 NUL

ffmpeg -i movie.mkv -c:v libx265 -b:v 500k -x265-params pass=2 -an h265\movie_500.mp4

ffmpeg -i my_movie.mkv -i h265\movie_500.mp4 -lavfi  ssim=265_movie_500_ssim.log -f null -

Lowest 1% SSIM

However the averages don’t tell the whole story. Because if every frame was that good, we shouldn’t need more than 6Mbps x265 or 10Mbps x264 4K video. So lets take a step back and look at the lowest 1% of the frames.

Generated via Python and matplotlib
(Click to view larger version)

Here we can see x264 has a much harder time at lower bitrates. Also note that the highest marker on this chart is 0.98, compared the total average chart’s 0.995.

This information alone confirmed for me that I will only be using x265 or newer encodings (maybe AV1 in 2020) for storing videos going forward.

Download the SSIM data as CSV.

How does CRF compare to ABR?

I have always read to use Constant Rate Factor over Average BitRate for stored video files (and especially over Constant Quality). CRF is the best of both worlds. If you have an easily compressible video, it won’t bloat the encoded video to meet some arbitrary bitrate. And bitrate directly correlates to file size. It also won’t be constrained to that limit if the video requires a lot more information to capture the complex scene.

But that is all hypothetical. We have some hard date, lets use it. So remember, Handbrake recommends a range of 22-28 CRF, and I personally cannot see any visual loss at CRF 20. So where does that show up on our chart?

Generated via Python and matplotlib
(Click to view larger version)

Now this is an apples to oranges comparison. The CRF videos were done via Handbrake using x265 10-bit, whereas everything else was done via ffmpeg using x265 or x264 8-bit. Still, we get a good idea of where these show up. At both CRF 24 and CRF 22, even the lowest frames don’t dip below SSIM 0.95. I personally think the extra 2500kbps for the large jump in minimum quality from CRF 24 to CRF 22 is a must. To some, including myself, it could be worth the extra 4000kbps jump from CRF 22 to CRF 20.

So let’s get a little more apples to apples. In this test, I encoded all videos with ffmpeg using the default presents. I did three CRF videos first, at 22, 20, and 18, then using their resulting bitrates created three ABR videos.

Generated via Python and matplotlib
(Click to view larger version)

Their overall average SSIM scores were near as identical. However, CRF shows its true edge on the lowest 1%, easily beating out ABR at every turn.

To 10-bit or not to 10-bit?

Thankfully there is a simple answer. If you are encoding to x264 or x265, encode to 10-bit if your devices support it. Even if your source video doesn’t use the HDR color space, it compresses better.

There is only one time to not use it. When the device you are going to watch it on doesn’t support it.

Which preset should I use?

The normal wisdom is to use the the slowest you can stand for the encoding time. Slower = better video quality and compression. However, that does not mean smaller file size at the same CRF.

Even though others have tackled this issue, I wanted to use the same material I was already testing and make sure it held true with 4K HDR video.

Generated via Python and matplotlib
(Click to view larger version)

I used a three minute 4K HDR clip, using Handbrake to only modify which present was used. The results were surprising to me to be honest, I was expecting medium to have a better margin between fast and slow. But based on just the average, slow was the obvious choice, as even bumping up the CRF from 18 to 16 didn’t match the quality. Even thought the file size was much larger for the CRF 16 Medium encoding than it was than for the CRF 18 Slow! (We’ll get to that later.)

Okay, okay, lets back up a step and look at the bottom 1% again as well.

Generated via Python and matplotlib
(Click to view larger version)

Well well wishing well, that is even more definitive. The jump from medium to slow is very significant in multiple ways. Even though it does cost double the time of medium it really delivers in the quality department. Easily beating out the lowest 1% of even CRF 16 medium, two entire steps away.

Generated via Excel
(Click to view larger version)

The bitrates are as expected, the higher quality it gets the more bitrate it will need. What is interesting, is if we put CRF 16 - Medium encoding’s bitrate on this chart it would go shoot off the top at a staggering 15510kbps! Keep in mind that is while still being lesser quality than CRF 18 - Slow.

In this data set, slow is the clear winner in multiple ways. Which is very similar to other’s results as well, so I’m personally sticking too it. (And if I ran these tests first, I would have even used slow for all the other testing!)

Conclusion

If you want a single go to setting for encoding, based on my personal testing CRF 20 with Slow preset looks amazing (but may take too long if you are using older hardware).

Now, if I have a super computer and unlimited storage, I might lean towards CRF 18 or maybe even 16, but still wouldn’t feel the need to take it the whole way to CRF 14 and veryslow or anything crazy.

I hope you found this information as useful as I did, if you have any thoughts or feedback please let me know!