**Updated in March 2022**
Since 2017, Bitmovin has actively worked in video and streaming standardization and has consistently driven standards from inception to implementation. Our founders co-created the MPEG-DASH streaming standard used by Netflix, YouTube, and many others, which is responsible for over 50% of peak U.S. internet traffic. Given our encoding, virtualization, and codec expertise, we are excited to work with and contribute to the AV1 codec. As of today, we have doubled down on bringing AV1 to the market and enabling our customers. We have continued to improve our AV1 video encoding technology, and the performance has drastically improved in the last 5 years. In the following, we provide a high-level summary of the features.
The AV1 Video Codec
First things first, what is AV1 and where does it come from? In September 2015 the Alliance for Open Media (AOMedia) was founded by leading companies from various industries with an association with media technology. Among them are browser vendors like Google, Mozilla, and Microsoft, hardware vendors like AMD, ARM, Intel, and NVIDIA, and content providers like Amazon and Netflix. The goal of the AOMedia is to develop an open, royalty-free, next-generation video coding format that is:
- Interoperable and open
- Optimized for the Internet
- Scalable to any modern device at any bandwidth
- Designed with a low computational footprint and optimized for hardware
- Capable of consistent, highest-quality, real-time video delivery, and
- Flexible for both commercial and non-commercial content, including user-generated content.
The new video coding format AOMedia Video 1 (AV1) is meant to replace Google’s VP9 and compete with HEVC/H265 from MPEG. The Alliance is targeting an improvement of about 50% over VP9/HEVC with only reasonable increases in encoding and playback complexity.
When comparing AV1 with HEVC, probably the biggest competitive advantage of AV1 will be that it is royalty-free, especially if we look at the still very uncertain royalty situation with HEVC. Currently, there are two patent pools with MPEG LA and MPEG Advance, plus some unknown HEVC IP owners who have not joined a pool yet. In the end, nobody will know how much you will need to pay in royalties for HEVC. This situation is obviously not satisfactory for the industry and especially, encoding, distribution, content, and hardware companies. (Download the AV1 Datasheet)
Bitmovin and AV1 Video Encoding as of 2022
We have made improvements to the core AV1 encoder in itself and have extensively benchmarked it against multiple practical use cases. The turnaround time and speed of encoding have improved by several orders of magnitude. And in regards to the quality, for the encoder version release v2.110.0, we found that AV1 can offer the same visual quality at 50% less bitrate for H.264/AVC and 30% less bitrate for H.265/HEVC respectively. That is a pretty significant gain!
In addition to the improvements to the core encoder itself, we have integrated AV1 with all the popular features that our customers have come to love. Here is a quick rundown :
- Since encoder version 2.104.0, 3-pass encoding with AV1 is generally available. We have found that three-pass AV1 video encoding provides significantly better bitrate distribution compared to the regular 2-pass encoding.
- Since encoder version 2.109.0, Per-Title encoding with AV1 is available now. Per-Title is one of our biggest competitive advantages. We are proud to offer this also for AV1.
- Since encoder version 2.110.0, AV1 video encoding offers three smart presets. This allows customers to choose an optimal tradeoff between the quality and speed of the AV1 encodings.
Also at Bitmovin, we like to keep our promises 😉. We promised five years ago that we will not stop innovating around AV1 and that we will enable our customers in the best possible way with our AV1 solutions. We are excited to announce that we have kept our end of the bargain. We have developed two patent-pending technologies around AV1. We cannot delve into the details now but just to tease you out, it significantly improves the turnaround times for Per-Title and 3-pass encodings. Keep watching this space for more details about this soon!
And here is the cherry on top of all this. It’s easy to get all this awesome Per-Title ABR encoding together with the AV1 codec and DASH packaging in a SINGLE API call! Yes, it’s not a typo. We said SINGLE. Can you believe that 🤯🤯!? What are you waiting for you? It’s easier than ever to get started with AV1. Try it and reach out to us if you have any questions! We are happy and excited to get you onboard with AV1.
How AV1 Video Encoding Development Works
The AV1 codec has its roots in the codebase of Google’s VP9/VP10 codec with an additional 77 experimental coding tools that have been added and are under consideration. Out of that 77 experimental coding tools, only 8 are currently enabled by default (adapt_scan, ref_mv, filter_7bit, reference_buffer, delte_q, tile_groups, rect_tx, cdef), but the performance of the codec is already appealing. The final goal is to get as many promising coding tools into the final version of the codec and afterward freeze the bitstream specification.
The following procedure explains the high-level process on how experiments can be added to the AV1 codec:
- Coding tools are added as experiments into the AV1 codebase. They are controlled at build-time by flags (e.g., –enable-experimental –enable-<experiment-name>).
- The hardware team (group of hardware members inside of AOMedia) reviews the experiments to ensure it can be implemented in hardware.
- Each experiment needs to pass an IP review to ensure no IPs are violated.
- Once reviews are passed the experiment can be enabled by default.
As of today, it is not sure which experiments will make it into the final codec. However, we want to highlight a few that look promising today:
It is an effective algorithm for removing ringing artifacts from a coded frame. It plugs in right at the end of the decoding process, so it is easy to integrate. Blocks are searched for an overall direction that is taken into account when applying a conditional replacement filter (CRF) to reduce the risk of blurring and only take obvious ringing patterns into account. It is currently enabled by default.
PVQ (Perceptual Vector Quantization)
This experiment was originally developed for the Daala codec and has the potential to bring a lot of gains, however, it is also quite difficult to integrate into AV1 because PVQ interacts with many other parts of a codec. Compared to the usual scalar quantization, PVQ offers a lot more flexibility to control quantization. It makes techniques like Chroma from Luma or Activity Masking easier. Activity Masking is trying to provide better resolution in low contrast areas. This can be achieved by varying the codebook which is possible with PVQ.
Chroma from Luma (CfL)
CfL is based on a rather simple idea: Take advantage of the fact that edges in the chroma plane are usually well correlated with those in the luma plane. As CfL works entirely in the frequency domain, it can be easily implemented using PVQ. Using PVQ, the chroma coefficients can be predicted from injected luma coefficients. It is a very promising tool as it is quite simple to compute and provides nice benefits with much cleaner colors.
Bitmovin AV1 VoD and Live Encoding
The Bitmovin encoding service now supports AV1 video encoding for VoD and Live. It is possible to encode AV1 with our cloud encoding service. Currently, AV1 video encoding with common encoding tools is a very time-consuming process, as can be seen in the below screenshot taken from a Lenovo T540p notebook with an i7-4800MQ, 8GB RAM running Ubuntu 14.04. It would take 8 hours and 42 minutes to encode a 1080p@24fps 40-second long sequence (Tears of Steel Teaser) with a target bitrate of 1.5Mbps.
The encoding runs with about 1.93 fpm (frames per minute) which would translate to 0.032 fps (frames per second). If you want to achieve real-time with 24 fps you would need at least 746 times the computing power on a single machine, which is not very practical in a real-world scenario. Clearly, we need another approach to encode with reasonable speeds, especially when it comes to live streaming.
Thanks to our chunk-based encoding approach that allows us to scale a single encoding among multiple instances we can encode AV1 with reasonable turnaround times and it’s also possible to use AV1 for live streams. Our chunked encoding allows us to speed up the encoding almost linearly with the number of instances that are added to the encoding cluster and this approach works with our cloud encoding the same way it works with our on-premise setups that are based on Kubernetes and Docker. Consequently, we can reach the same encoding speeds for AV1 that our customers have come to expect for H264, VP9, and HEVC encoding, which makes the codec effectively usable for media companies and content providers throughout the industry.
We also encoded the ToS teaser with our AV1 encoder in the cloud with the default configuration where we achieved 7 fps, which is about 219 times faster than what was achieved in the test with the Lenovo notebook. This is already pretty impressive however, we were not satisfied with the speed as it was still below real-time. So we tried with an enterprise set-up by just adding more instances to the encoding process. The resulting encoding speed was at 36 fps, which is about 1125 times faster than with the single Lenovo notebook.
In addition, we don’t have to compromise on quality for speed because our encoder does not need to sacrifice quality to reach a certain speed on a single instance as other encoding vendors typically do. With our approach we are not bound to the hardware restrictions of a single instance, we can add more instances to an encoding cluster to generate the quality that our customers have configured in a reasonable time or in real-time for live streams. With our chunk-based implementation of the AV1 video codec, we can encode videos with AV1 even faster than in real-time without compromising quality.
How to implement an AV1 Livestream
In most cases, to run live stream encodings you would need around 4 to 15 Mbps with traditional codecs like H264 to deliver the same quality. So AV1 could reduce your CDN and storage cost by up to 10x.
The setup of our AV1 live workflow that we will showcase consists of the following components:
- OBS RTMP mezzanine stream, 12Mbps 1080p@30fps
- Bitmovin Distributed AV1 Cloud Encoder running in Google Cloud receives an RTMP ingest and transcodes to 1.5Mbps 1080p@30fps segmented WebM. Segments will be directly transferred to a Google Cloud Storage bucket.
- The Bitmovin Distributed AV1 Cloud Encoder also generates HLS and MPEG-DASH manifests that will be transferred to the Google Cloud Storage bucket. Enabled experiments of the AV1 codec are: adapt_scan, ref_mv, filter_7bit, reference_buffer, delte_q, tile_groups, rect_tx, cdef
- Native playback on a desktop with a Bitmovin Player based on aomdec and ffplay
Our AV1 encoder generates WebM segmented output that could be used with HLS or MPEG-DASH for VoD and Live. However, as AV1 is currently not supported by any browser, we had to write our own player that is able to playback our AV1 live stream. We updated the aomdec application to be able to download and decode the AV1 chunks which can be seen in the left console window. Fortunately, decoding is not as resource intensive as the encoding, which allows you to decode the AV1 stream on normal hardware without special requirements, e.g., the same Lenovo notebook (i7-4800MQ, 8GB RAM running Ubuntu 14.04) that was not capable of encoding this video just near to realtime could easily playback AV1 in software. After the decoding step, we pipe the decoded YUV frames to ffplay to display the stream in a window as you see in the screenshot above. We plan to contribute this functionality back to aomdec after a technical cleanup of the current implementation.
A Practical Quality Comparison
Although the bitstream from AV1 is not finalized yet and much work needs to be done to further improve the quality of the codec, we wanted to get a snapshot of the current state and compare its quality with AVC/H264, HEVC/H265, and VP9. For that purpose, we made two different quality comparisons, the first one with two objective metrics, PSNR and SSIM. PSNR does not always correlate well with perceived quality but is the de-facto standard for video quality comparisons. SSIM is a perception-based quality metric that should give better results in regard to perceived quality.
For the second comparison, we chose to make a side-by-side quality comparison between AV1 and the other codecs. This quality comparison targets a practical use case where the resulting content can be used for Adaptive Bitrate Streaming (ABR). Therefore we have used a fixed Group of Pictures (GOP) size for our experiments and also used Variable Bitrate (VBR) encodings with a target bitrate. This approach is established in the industry but results can vary from scientific evaluations that purely target abstract use cases and theoretical encoder performance through the HM (HEVC reference software) and JM (AVC reference software) reference software that has no practical relevance in the industry.
Let’s first start with the objective quality comparison with PSNR. We encoded the open-source movie Sintel from the Blender Foundation with VBR to the following target bitrates: 100Kbps, 250Kbps, 500Kbps, 1Mbps, 2Mbps, 4Mbps and calculated PSNR and SSIM for the bitrate that has actually been achieved by the individual codec (typical codecs in VBR mode do not hit the target bitrate exactly).
The following encoding settings for the different codecs were used in the Bitmovin Encoding Service:
GOP Size: 96 frames (4 seconds), Me_range: 16, Cabac: true, B-Adapt: 2, Me: UMH, Rc-Lookahead: 50, Subme: 8, Trellis: 1, Partitions: All, BFrames: 3, ReferenceFrames: 5, Profile: High, Direct-Pred: Auto
GOP Size: 96 frames (4 seconds), Sao: 1, B-Adapt: 2, CTU: 64, Profile: Main, BFrames: 4, Rc-Lookahead: 25, WeightP: 1, MeRange: 57, Ref: 4, Subme: 3, Tu-Inter-Depth: 1, Me: 3, No-WeightB: 1, Tu-Intra-Depth: 1
GOP Size: 96 frames (4 seconds), Cpu-used: 1, Tile-columns: 4, Arnr-Type: Centered, Threads: 4, Arnr-maxframes: 0, Quality: Good, Frame-Parallel: 0, AQ-Mode: none, Arnr-Strength: 3, Tile-Rows: 0
Build f3477635d3d44a2448b5298255ee054fa71d7ad9, Enabled experiments by default: adapt_scan, ref_mv, filter_7bit, reference_buffer, delte_q, tile_groups, rect_tx, cdef
Passes: 1, Quality: Good, Threads: 1, Cpu-used: 1, KeyFrame-Mode: Auto, Lag-In-Frames: 25, End-Usage: VBR
The above diagram clearly shows that AV1 already outperforms all the other codecs for each bitrate setting. For bitrates from 1Mbps and higher the quality difference is already pretty big (> 0.5db which is usually clearly visible). VP9 and HEVC/H265 are very similar from a PSNR perspective, however, VP9 was the codec that overshot the target bitrate by far the most.
We also compared the four codecs with SSIM. The results can be seen in the above diagram and are quite similar to PSNR with some slight differences. AV1 is still the best performing codec over all bitrates, and AVC/H264 lags behind. However, interestingly AVC/H264 catches up with increased bitrate. An explanation for that could be that in the higher bitrates we can reach nearly the quality of the source material with all codecs, which results in only minor differences between the codecs.
Additionally, we created several side-by-side quality comparisons where we experimentally changed the target bitrate for each codec to reach an average of 500 Kbps. Below you can see the quality comparisons between the encodings comparing the quality of Bitmovin AV1 video encoding with AVC/H264, HEVC/H265, and VP9. We used the well-known Tears of Steel teaser that is 40 seconds long with a 1080p resolution for the comparison, selecting a complex scene that is hard to encode.
When comparing AV1 video encoding with AVC/H264 the quality difference is very obvious as expected. We can clearly see multiple encoding artifacts and blocking in the right part of the image that has been encoded with AVC/H264. In contrast, the left part with AV1 Video Encoding looks much cleaner without obvious encoding artifacts.
Looking at the quality difference between AV1 and VP9 it is not as obvious as with AVC/H264, but still quite visible. Especially the borders of the tiles of the sphere show encoding artifacts and the overall picture in VP9 seems to have quite some noise. We can also identify some blocking artifacts that are not visible in AV1.
HEVC/H265 visually looks a bit better than VP9, however, it still has visible encoding artifacts, especially in the lower part of the image and around the arm of the guy with the red coat. When we look closely at the arm we can see that the color is not encoded as nicely as with AV1 and shows some noise.
Bitmovin’s culture and vision have always been to be a technology leader and our passion for video means we consistently tackle the most complex video problems. Why? Because it’s fun and challenging and our team loves a challenge!
Besides that, there are already use cases for an AV1 video encoding where you could use it as your mezzanine format to preserve a high-quality version of your video at a low bit rate that can be used to create your adaptive bitrate renditions or other formats. Using AV1 for that use case would decrease your storage footprint and speed up transfer times inside of your data center or for upload to the cloud.
Furthermore, with the companies behind AOMedia, like AMD, ARM, Intel, NVIDIA, Google, Microsoft, Mozilla, Netflix, and Amazon, it should not take too long to get broad support for AV1. AMD, Intel, and NVIDIA cover the desktop market quite nicely, and ARM and Intel the mobile market. Additionally, the major browser vendors, Google, Microsoft, and Mozilla will make sure that the codec finds its way into the browsers soon after the bitstream freeze. Google, Netflix, and Amazon will make sure that AV1 content will be available quickly and that will further drive adoption and hardware support.
AV1 is the next generation video codec and it’s on track to deliver a 30% improvement over VP9 & HEVC – Learn More