The GPU hardware encoder in the Raspberry Pi can greatly speed up encoding for H.264 videos. It is perfect to use for transcoding live streams as well. It can be accessed in FFmpeg with the h264_omx encoder. But is it fast enough for live stream a 1080p webcam?
You might have already seen a lot of people using the built-in raspberry pi cameras to stream crisp 1080p video, so why is this even a question? Well the catch there is the Pi Camera itself supports native H.264 encoding. Some webcams do as well, and they are honestly the best choice to use rather than constantly battering the GPU encoder if you don’t need to.
However, you may just happen to have an old cheap webcam that only does MJPEG streams. Those streams are generally too large to pump over the Raspberry Pi’s wifi at full fps. Would using the hardware encoder help you?
The Results First
This is why you’re here, let’s cut to the chase and do a comparison of the two latest Raspberry Pi’s available, the Pi 4 B, and Pi 3 B+ (we’ll throw in the little Pi Zero Wireless for fun too.) We’ll talk about the two videos used later, but suffice to say, Trackday is easier to encode and closer to what an average Webcam would produce. Artist is more of a torture test.
Boom! The Raspberry Pi 4 B is right in the butter zone. Most webcams that are 30fps would be handled just fine with the Pi 4 (depending on the quality of sensor and what you’re filming). The Pi 3 B+ isn’t terrible, but wouldn’t be able to encode a realtime stream smoothly.
The little Pi Zero? Well, it did its best and we’re proud of it!
The first video I used was a video captured from a car on a racetrack. It is 1920×1080 at 30fps captured from a dash cam.
The original bitrate was a 10.5MB/s and was cut down to 5MB/s with all our encodes.
The second file, artwork in progress by Clara Griffith, is also 1920×1080 at 30fps. However it is using BT.709 color space and started out at 35MB/s!
If you see a webcam that advertises as “HDR” it is most likely using the BT.709 color space as well, and may give your Pi a headache.
This one was also compressed down to only 5MB/s. Why 5MB/s you ask? Well as it turns out, using the standard 2.4GHz wifi band, the Pi 3 and Pi 4 can each sustain about 6.5MB/s download speed over my wireless. That means I know these videos could be played smoothly over wifi. The Pi Zero W on the other hand could only sustain around 3MB/s wifi transfer speed.
All three systems were set up to use 256MB of GPU ram.
This actually took me by surprise to be honest. The quality of the encode is quite good when comparing to what a software encoder could do. I didn’t pull any punches either, the x264 encoder was set to dual pass and using veryslow preset with the film tune set. x264 commands:
Of the two videos, Trackday is more realistic to what a webcam would experience and both encoders are near equal. So why was the Artist video so much better quality after encode, even though it started out with a lot higher bitrate? My informed guess on that is how crisp the original was, as well as the content is slow moving enough, the H.264 was able to reuse larger parts of the video for subsequent frames.
That means the software encoder x264 wins by virtue of being able to effectively use B-frames. Whereas the OMX hardware encoder doesn’t have support for B-frames. Therefor the Pi is on even ground when B-frames aren’t effective, but lags behind when they come into play.
A Note on Pi Camera Native H.264
I have found very little information about what Pi Cameras actually support H.264 natively. I only have “knock off” Raspberry Pi cameras that use the ribbon cable. They all support H.264 streams, which you can check with:
I was kinda worried they were using some hackery to “pretend” to actually have native H.264 but instead using the GPU. However if the Pi Zero has anything to show, it has a really hard time encoding 1080p videos with the GPU encoder, so I do believe they have native support.
It’s 2021 and there still isn’t a lot of good info about AMD’s VCN hardware encoder for consumers. To that end, I will present my own take on the current “war” between software and hardware encoders, then go into quick details of how to best use AMD GPUs for encoding for video archival with FastFlix.
Note: I will only be comparing HEVC/H.265 10-bit HDR10 videos (both source and output). This use case is not usually covered in benchmarks and tests I have seen, and is more of the interest to those who have seen my previous posts on Encoding UHD HDR10 videos but may want to hardware accelerate it.
HDR10 – A set of metadata presented alongside the video to give the display additional details
VMAF – Netflix’s video quality metric used to compare encoded video to source’s quality
Software vs Hardware Encoders
Software encoders are coded to run on any general purpose CPU. Doesn’t matter if it’s an Intel i7, an AMD 5900x or even on your phone’s ARM based CPU. It gives them great versatility. In the other corner, Hardware encoders rely on specific physical hardware components to exist in a system to use them to accelerate their transcoding. In today’s case, it takes a AMD GPU with VCN support to use the features we want to test.
Apples and oranges are both fruit, sports cars and pickup trucks are both vehicles, and software and hardware encoders both transcode videos. Just as it’s futile to compare the track capabilities of a supercar to the towing capacity of a pickup truck, we are about to venture into said territory with these encoders.
Use case over metrics
The workhorse of the HEVC software encoding world is x265. There are plenty of other software encoders like the industry used ATEME TITAN File for UHD blu-rays or other open source encoders like Turing codec or kvazaar, but because of their lack of inclusion in standard tools like FFmpeg, they are overlooked.
So what is this workhorse good for? Flexibility and video archival. By being able to run on almost anything that can compile C code, x265 is a champion of cross platform operations. It is also the standard when looking for pure quality metrics in HEVC videos.
Comparatively, hardware encoding, in this case using AMD’s Video Coding Engine (VCE), is built to be power efficient and fast. Really, really fast. For example, on a 6900XT you can real-time encode a 60fps UHD stream on the slowest setting!
Let’s see what happens when they venture into each other’s bailiwicks.
Here’s what everybody loves: A good graph. We’re going to compare x265 using it’s fastest encoding speed vs the slowest setting AMD’s VCE currently has with a 60fps HDR10 4K source video.
As expected, it was a slaughter. Hardware encoding ran at 96 fps while x265 could only manage 14.5 fps. AMD’s hardware encoding clearly pummels the fastest setting x265 has to offer, even on an i9-9900k. Even if using an AMD 5950x which may be up to twice as fast, the hardware encoder would still dominate.
Where does this matter
Streaming and real-time transcoding. Hardware encoders were designed with the idea of “accelerated” encoding. Which makes them great for powering your Zoom calls or streaming to Twitch.
Encoding Quality Prowess
Now lets venture into x265’s house and compare computed quality with VMAF. We’ll be using the veryslow setting, darn the total time taken!
In this scenario we will compress a UHD video with a bitrate of 15,000k to four different rates. The goal for a decent encode is to reach at least VMAF 93, which is the bitrate range we will stay above. (VMAF 93+ doesn’t mean you won’t notice quality loss. It simply means that it probably WILL BE apparent if it is less than that.)
I have noticed while watching the AMD VCE encodes that it doesn’t do a great job with scene changes. I expect that is because VCE doesn’t support pre-analysis for HEVC, only for H.264. AMD VCE also suffers from lack of B-frame support, which I will talk about in the next blog post.
Where does this matter
Video archival. If you have a video that you are planning to discard for a high quality re-encode to save on file size, it’s better to stick with x265. Keep in mind, don’t just re-encode because you want to use a “better” codec, it’s always best to keep the original.
This is a comparison I don’t see as often, and I think is overlooked. Encoding takes a lot of power, which means it costs money. I have been told by many FastFlix users that they let their x265 encodes run overnight, and some of their encodings take days!
This is also a harder to measure metric, as you need both encoders to produce the same quality output, as well as know their power usage. The entire thing also labors under the assumption that the only purpose of this machine is to encode the video while it is powered on, so please keep all that in mind as we dive into this.
To achieve the same quality of result file, it costs ten times as much in electricity to get the job done. This may not matter if you’re talking about a random encode here or there, but if you have a lot of videos to burn through, it could really start saving cash by switching to hardware encoders.
The Nitty Gritty about the power (Methodology)
Power usage will differ across hardware so this is for a very specific case that I can attest for (using both HWmonitor and a KillAWatt monitor). The 6900XT uses 63 watts over it’s baseline when encoding, for a total system draw of ~320w. The i9-9900k uses 111 watts over baseline for a total system draw of ~360w. (Keep in mind there is some extra CPU usage during hardware encode as well, so that is why total power is not a direct difference between the two.)
For the encoder speed, when using a UHD file I was able to get within 0.1% difference of VMAF when using VCE slow (same speed as above) and x265 veryfast (at 10.35fps).
Lets take a genericized use case of a two hour long video running at 24fps. 24fps * 60 seconds in a minutes * 60 minutes in an hour * 2 hours = 172,800 frames.
Estimated times and cost:
VCE – slow – 6900XT @ 96.47fps – 29.85 minutes
0.16 kWh/day @ 320 watts
0.019$ at @12 cents per kWh
x265 – i9-9900K@ 10.35fps – 287.3 minutes (four and a half hours)
1.72 kWh/day @ 360 watts
0.206$ at @12 cents per kWh
Where does this matter
The cost difference probably doesn’t sway many individuals But if you’re a prolific encoder, this could save you time and money.
Super Technical Head to Head Summary
Hardware (AMD VCE)
Lacks basic HEVC needs (B-frames / pre-analysis)
Slow to Super Slow
⭐Any old electrified rock
Newer AMD GPU Windows OS
All the powah!
⭐ sips daintily
So the winner is…. neither. If you’re encoding professionally you’ll be working with totally different software (like TITAN File). Then if you’re using it at home, it really just depends with what hardware you already have. If you’re wondering which GPU to get for the best encoding, wait for next month’s article 😉
Basically they both do what they were designed for. I would say Hardware encoders might have a slight overall edge, as they could be used for all cases. Whereas x265 currently can’t do UHD HDR10 real time encoding on consumer hardware.
Encoding HDR10 with AMD GPUs
Already got an AMD GPU and want to start encoding with it? Great, let’s get down to how to do it. First off make sure you are using Windows. If you’re using Linux for this, don’t.* If Linux is all you have, I would still recommend using a passthrough VM with Windows on it.
For Windows users, rigaya has made a beautiful tool called VCEEncC that has HDR10 support built in. It is a command line tool, but good news, FastFlix now supports it!
You will need to download VCEEncC manually as well, and make sure it is on the system path or link it up in File > Settings of FastFlix.
VCE doesn’t have a lot of options to worry about like other encoders, so can be on your way to re-encoding in no time!
* Possible on Linux to using VAAPI to encode HEVC. You would need to apply custom MESA patches to enable HDR10 support. AMF / VCEEncC only supports H.264 on Linix currently.
Best quality possible with VCE
Beauty is in the eye of the beholder, and so is video quality. Some features, like VBAQ (Variance Based Adaptive Quantization) will lower the measured metrics like VMAF and SSIM, but are designed look better to human eyes. Assuming you care about how the video looks, and aren’t just trying to impress your boss with numbers, we will stick with those.
Motion Vector Accuracy
Of course the largest determination of quality will be how much bitrate you will allow for (or which quantization rate you select). FastFlix has some loose recommendations, but what is truly needed will vary greatly dependent upon source. A GoPro bike ride video will require a lot more bitrate than a mounted security camera with very little movement overall.
Warnings and gotchas
Not all features are available for all cards. Also some features like b-frame support were promised for RDNA2 but still are not yet available.
Driver versions can make a difference. Always try using latest first, but if you experience issues using VCE it may not be using a new enough AMF version and need to downgrade to an older driver.
What do I use?
Personally I avoid re-encoding whenever possible. However, now that I do have an AMD GPU I do use it for any of my quick and dirty encoding needs. Though I would be saying the same about NVENC if I had a new Nvidia GPU (which does have B-frame support). In my opinion it’s simply not worth the time and energy investment for encoding with software. Either save the original or use a hardware encoder.
What about Nvidia (NVENC) or Intel (QSV)?
I am working to get access to latest generation hardware for both Nvidia’s NVENC and Intel’s QSV in the next month, so hopefully I will be able to create a follow up with some good head to head comparison. Historically NVENC has taken the crown, and by my research VCE hasn’t caught up yet, but who knows where QSV will end up!
x265 was used at commit 82786fccce10379be439243b6a776dc2f5918cb4 (2021-05-25) as part of FFmpeg
CPU is a i9-9900k
VCEEncC 6.13 on 6900xt with AMF Runtime 1.4.21 / SDK 1.4.21 using drivers 21.7.2
These tests were done on my own hardware purchased myself. All conclusions are my own thoughts and opinions and in no way represent any company.
I know this may sound like a weird statement coming from the author of FastFlix, but from the bottom of my heart, please stop the needless pixel loss! Every time you re-encode (aka transcode) a video it losses information, which lowers the quality and makes it a lot worse if you ever need to do it again. Re-encoding makes you a pixel killer!
Why am I saying re-encoding and not just “encoding” when you may have the original video? Sadly, to even fit onto your computer or device the video you are working with has already been encoded in a highly compressed manor. Even phones and professional cameras like the RED series use real-time compression. Raw video takes too much bandwidth for standard storage media. For example, imaging recording raw video using a video sensor at 16-bit 60FPS. A single minute of UHD footage would be near 60GB! That translates into a bitrate of over 7,500,000 kbps. Compare that to a re-encoded YouTube video of HDR 60FPS 4K footage is around 30,000 kbps, 250x times less quality!
Why does it matter?
Everyone has seen potato quality images and videos being re-shared again and again. The quality loss is due to needless re-encoding.
Websites generally always re-encode videos for a variety of reasons, such as adding watermarks or forcing certain bitrates. It may be required in some instances, but for most cases it’s overkill. If I had the power I would challenge websites to instead publish what encoding targets are required and allow for direct playback of the original file.
But while we don’t have control over that, that makes it even more important to not make it worse when you have control over it.
In the above example I took a short video and encoded it again and again and again using the same settings each time. The video still looks good while being watched, but if you zoom in you can see the entire thing has essentially become blurred. Don’t do this to your own videos, save the pixels!
Another thing to consider is not just the result you have now, but how it will look in ten or twenty years. Sure, the single 1080p or 4K re-encode you did now may look perfect. But what about when 16K TVs are standard? What about when your phone or monitors pixel density is four to eight times higher than it is now? Behind the scenes of good TVs and devices is an “Ai” chip that upscales videos to look better on newer screens. With less detail to start with, it won’t look as good as if you didn’t re-encode.
Remember that amazing video you took with your flip phone all those years ago that you can’t even tell what is happening in it anymore? You don’t want that to happen to your videos now.
There are some cases that it cannot be avoided that we will cover later, but even common operations like trimming videos and rotating them can be accomplished without re-encoding.
Trim and Rotate without re-encoding
Two very common reasons people want to re-encode videos is to shorten them to a particular section, or rotate it the proper direction. Thankfully you can do both of those without modifying the original video stream. I am using the command line tool ffmpeg to accomplish this, which is available for free.
To rotate the video, you just have to add some metadata to the container the video is in. This is how phones set videos to portrait or landscape mode without having to change their encoder settings.
In the above example, replace your_video.mp4 with the video file you need rotated. It will be copied with the new metadata to rotated_video.mp4 and now should be rotated properly.
To cut out a section of the video, you simply need to “copy” everything between the two desired points. For example if you want to cut out a 48 second section between 1:02 and 1:50 (one minute two seconds and one minute fifty seconds), use the following command.
There are a few cases where you simply cannot avoid re-encoding. If you need to add effects, crop, make any actual modifications to the video, you will need to re-encode.
However if you are keeping the video as is, there are only three instances you should consider re-encoding for:
Limited bandwidth scenarios
Cannot provide required storage space
Limited bandwidth scenario
ISPs still like to pretend upload speeds don’t matter. Even if an ISP provides 1Gbps down most still have less than 45Mbps upload peak. Things get even worse for mobile, where the average upload rate is around 10Mbps. Then on top of that, a general rule of thumb is your video bitrate should be around half or less of available bandwidth to ensure there isn’t stuttering.
That means if you’re planning to share videos in real time with the world, you simply have to re-encode (transcode) it.
There are some large companies that have terrible compatibility with commonly accepted formats to be anti-competition. That means if you or a loved one is enslaved to some large fruit company, you may need to tinker with your videos to please the orchard overlords.
“Storage is cheap” is a phrase I hear a little too much in the coding world. In the real world without a large quarterly tech budget, you have to count your pennies. If you can re-encode a video so that it’s near visually lossless while saving on storage space for your needs, go for it!