For the APAC live event, I gave a presentation comparing VP9 vs HEVC (H.265). During the session, I discussed the fundamental differences between the two “modern codecs” and tied it off with an early analysis of each codec’s performance. These results were obtained using the open-source encoders libvpx-vp9, x264, and x265. This blog post acts as a follow-up to my tech talk and I will go a little into further detail about the experiment and share the results of my research.
VP9 vs HEVC: The encoding setup
For the test I used the libvpx-vp9 encoder (version 1.8.2) for VP9 encoding, the x264 encoder (tag235ce6130168f4deee55c88ecda5ab84d81d125b) for h.264/AVC encoding and the x265 encoder (version 3.2) for h.265/HEVC encoding. I also compiled libvmaf (version 1.5.1) and ffmpeg (version 4.2.3) to run the encoders and perform PSNR, SSIM and VMAF measurements. If you want to recreate the same execution environment: I used Docker to build it so you can recreate the exact same environment using my Dockerfile which can be found here.
For the test set I used Full HD and 4K sequences from the JEVT SDR test set  which was also used in the standardization of VVC. Some of these sequences are well known and were already used in several prior standardization activities. All sequences are 10 seconds long and used in YUV 4:2:0 subsampling. The sequences are as follows:
For encoding, I used default settings with ffmpeg. All encodings implemented 2-pass encoding with a set target bitrate. The corresponding ffmpeg calls look like this:
ffmpeg -i input.yuv -c:v libx264 -preset veryslow -b:v br --pass 1/2 enc.mp4 ffmpeg -i input.yuv -c:v libx265 -preset slow -b:v br --pass 1/2 enc.mp4 ffmpeg -i input.yuv -c:v libvpx-vp9 -b:v br --pass 1/2 enc.mp4
I used the following presets for each encoder:
- x264 – very slow
- X265 – slow
- libvpx-vp9 no preset was chosen (which corresponds to a cpu-used value of 1)
These settings were chosen from experience. While they do not yield the highest possible compression performance, they correspond to a very high quality encode with a good tradeoff between encoding time and quality.
The encodings were performed under two scenarios:
- Fixed Resolution: no scaling is applied. Encoding was performed at the resolution of the original sequence with various different target bitrates. The bitrates for x265 and libvpx-vp9 were: 4.8 Mbit/s, 2.4 Mbit/2, 1.8 Mbit/2, 1.2 Mbit/s and 0.8 Mbit/s. For x264, these values were multiplied by a factor of two. Based on the pixel count, another factor of four was applied to the bitrates for the 4K encodes.
- Bitrate Ladder: Encoding was performed at a range of different resolutions and bitrates also referred to as a bitrate ladder. These were: 1920×1080 at 4.8Mbit/s, 1920×1080 at 2.4Mbit/s, 1280×720 at 1.8Mbit/s, 1280×720 at 1.2Mbit/s, 854×480 at 0.8 Mbit/s, 640×360 at 0.4 Mbit/s and 426×240 at 0.2 Mbit/s. For the 4K encodes, two additional points with 3840×2160 at 19.2 Mbit/s and 9.6 Mbit/s were added. As in the first scenario, the bitrates were multiplied by a factor of two for x264. The final measurement step was performed after upsampling back to the resolution of the original source. The default scaling algorithm is bicubic.
For each encoding, multiple different measurements were performed. In the cases where encoding was performed at a lower spatial resolution, the measurement was performed after upscaling the reconstruction back to the resolution of the source. PSNR and SSIM measurements were performed for the three components (Y/U/V) as well as an averaged value. VMAF was calculated as well. For the 4k source files, the 4k VMAF model was applied. For the encoding time, I measured the absolute elapsed time as well as the CPU time per thread. This is a sample plot for the sequence “MarketPlace” in the fixed resolution scenario (hover over image to zoom):
For both scenarios I calculated BD-rate results for the average PSNR, average SSIM, and VMAF values relative to x264 :
As one can see the libvpx-vp9 encoder is able to compete with x265 very well when it comes to coding performance. However, the PSNR and SSIM based BD values are consistently higher for libvpx-vp9 and the VMAF-based BD-rate values are higher for x265 in the fixed resolution scenario. In the bitrate ladder scenario, both encoders show very similar results.
What is surprising is that the default x265 configuration seems to use a much lower QP for the color components (U/V) compared to the other two encoders. However, because of the way the average values are calculated, this does not have a huge impact on the BD results.
VP9 vs HEVC Complexity Levels
For all encodings, I also measured the overall runtime of the encoding as well as the CPU-time per thread. Both of these values can give us an indication of how well the encoders can utilize multiple cores. All tests were performed on an Intel 6 core (12 thread) processor. The results for x265 and libvpx-vp9 were taken relative to the values of x264 and then averaged. The following table displays the relative factors compared to x264:
While both x265 and libvpx-vp9 have higher runtimes compared to x264, we can see that x265 is much better at utilizing available threads efficiently, which results in much lower values for the overall runtime factors. When it comes to the CPU time, libvpx-vp9 has an advantage over x265 in the tested configuration. Similar observations can be made of the table above. So depending on your application this may be a disadvantage or not. For example, since our encoder uses multiple vectors to utilize the available threads efficiently this behavior is not a big disadvantage for us.
Finally, I would like to provide all the files needed in order to recreate the results. Furthermore, the archive includes all the result files that were used to determine my findings. I encourage everybody to double-check them. However, for legal reasons, I can not provide the encoded video sequences or the original uncompressed YUV test sequences. The archive includes the following scripts which should be helpful:
- Test shell scripts: These scripts were used to perform the encoding and the measurements in the docker container. Please feel free to use these in your own tests.
- Python scripts: The python scripts were used to calculate the BD results (calculateBDResults.py), to plot the measured values per sequence (plotResults.py) and to plot the results per frame (plotPerFrameResults.py). Please use these scripts to take a detailed look at the results. You require python 3 and matplotlib installed. Each script must be called with the name of a sub-folder that should be plotted.
While this is just a quick and superficial encoder comparison, I tried to keep it close to practical applications. From the VP9 vs HEVC test here, libvpx-vp9 is able to take on x265 when it comes to coding performance. Please note that only these encoders were tested and there are other AVC, HEVC, and VP9 encoders out there which may perform better.
If you have additional inputs to the test please reach out to me! I am very willing to run this again using a different set of settings.
 – A. Segall, E. François, W. Husak, S. Iwamura, D. Rusanovskyy – JVET common test conditions and evaluation procedures for HDR/WCG video – JVET-P2011
 – Gisle Bjontegaard – Calculation of average PSNR differences between RD-curves – VCEG-M33 Austin, Texas, USA, 2-4 April 2001
- 134th MPEG Meeting Takeaways: Standardizing Neural Network Compression for Multimedia Applications [Blog Post]
- Living in a Multi-Codec World – Future Codecs revisited [Blog Post]
- State of Compression Standards – VVC [Blog Post]
- The Video Codec Landscape (Sept 2020) [Updated Webinar Recording]