AV1 is a next generation video codec and Bitmovin is proud to deliver the world’s first AV1 live stream and to announce our membership in the Alliance for Open Media
We are excited to announce the support of AV1 for VoD and Live streaming in our encoding service which works in the cloud (AWS, Google, Azure) or on-premise with Kubernetes and Docker. As part of this launch, we have joined the Alliance for Open Media, a non-profit organization working to define and develop media technologies that address the need for an open standard for video compression and delivery over the web. As the first company to support both live streaming and VoD, we will showcase an AV1 Livestream at our NAB booth (SU9007CM) in Las Vegas from April 24-27. The AV1 Livestream will show 1080p live broadcast quality at just 1.5 Mbps. Currently you would need around 4 to 15 Mbps with traditional codecs like H264 to deliver the same quality. With AV1 you can reduce your CDN and storage cost by up to 10x.
Bitmovin has actively worked in video and streaming standardization and has consistently driven standards from inception to implementation. Our founders co-created the MPEG-DASH streaming standard used by Netflix, YouTube, and many others, which is responsible for over 50% of peak U.S. internet traffic. Given our encoding, virtualization and codec expertise, we are excited to work with and contribute to the AV1 codec.
The AV1 Video Codec
But first things first, what is AV1 and where does it come from? In September 2015 the Alliance for Open Media (AOMedia) was founded by leading companies from various industries with an association to media technology. Among them are browser vendors like Google, Mozilla and Microsoft, hardware vendors like AMD, ARM, Intel, and NVIDIA, and content providers like Amazon and Netflix. The goal of the AOMedia is to develop an open, royalty-free, next-generation video coding format that is:
- Interoperable and open
- Optimized for the Internet
- Scalable to any modern device at any bandwidth
- Designed with a low computational footprint and optimized for hardware
- Capable of consistent, highest-quality, real-time video delivery, and
- Flexible for both commercial and non-commercial content, including user-generated content.
The new video coding format AOMedia Video 1 (AV1) is meant to replace Google’s VP9 and compete with HEVC/H265 from MPEG. The Alliance is targeting an improvement of about 50% over VP9/HEVC with only reasonable increases in encoding and playback complexity.
When comparing AV1 with HEVC, probably the biggest competitive advantage of AV1 will be that it is royalty-free, especially if we look at the still very uncertain royalty situation with HEVC. Currently there are two patent pools with MPEG LA and MPEG Advance, plus some unknown HEVC IP owners who have not joined a pool yet. In the end, nobody will know how much you will need to pay in royalties for HEVC. This situation is obviously not satisfactory for the industry and especially, encoding, distribution, content and hardware companies.
How the AV1 Development Works
The AV1 codec has its roots in the codebase of Google’s VP9/VP10 codec with an additional 77 experimental coding tools that have been added and are under consideration. Out of that 77 experimental coding tools, only 8 are currently enabled by default (adapt_scan, ref_mv, filter_7bit, reference_buffer, delte_q, tile_groups, rect_tx, cdef), but the performance of the codec is already appealing. The final goal is to get as many promising coding tools into the final version of the codec and afterwards freeze the bitstream specification.
The following procedure explains the high-level process on how experiments can be added to the AV1 codec:
- Coding tools are added as experiments into the AV1 codebase. They are controlled at build-time by flags (e.g., –enable-experimental –enable-<experiment-name>).
- The hardware team (group of hardware members inside of AOMedia) reviews the experiments to ensure it can be implemented in hardware.
- Each experiment needs to pass an IP review to ensure no IPs are violated.
- Once reviews are passed the experiment can be enabled by default.
As of today, it is not sure which experiments will make it into the final codec. However, we want to highlight a few that look promising today:
It is an effective algorithm for removing ringing artifacts from a coded frame. It plugs in right at the end of the decoding process, so it is easy to integrate. Blocks are searched for an overall direction that is taken into account when applying a conditional replacement filter (CRF) to reduce the risk of blurring and only take obvious ringing patterns into account. It is currently enabled by default.
PVQ (Perceptual Vector Quantization)
This experiment was originally developed for the Daala codec and has the potential to bring a lot of gains, however, it is also quite difficult to integrate into AV1 because PVQ interacts with many other parts of a codec. Compared to the usual scalar quantization, PVQ offers a lot more flexibility to control quantization. It makes techniques like Chroma from Luma or Activity Masking easier. Activity Masking is trying to provide better resolution in low contrast areas. This can be achieved by varying the codebook which is possible with PVQ.
Chroma from Luma (CfL)
CfL is based on a rather simple idea: Take advantage of the fact that edges in the chroma plane are usually well correlated with those in the luma plane. As CfL works entirely in the frequency domain, it can be easily implemented using PVQ. Using PVQ, the chroma coefficients can be predicted from injected luma coefficients. It is a very promising tool as it is quite simple to compute and provides nice benefits with much cleaner colors.
Bitmovin AV1 VoD and Live Encoding
The Bitmovin encoding service now supports AV1 encoding for VoD and Live. It is possible to encode AV1 with our cloud encoding service and also with our managed on-premise offering through Kubernetes and Docker. Currently AV1 encoding with common encoding tools is a very time consuming process, as can be seen in the below screenshot taken from a Lenovo T540p notebook with an i7-4800MQ, 8GB RAM running Ubuntu 14.04. It would take 8 hours and 42 minutes to encode a 1080p@24fps 40 second long sequence (Tears of Steel Teaser) with a target bitrate of 1.5Mbps.
The encoding runs with about 1.93 fpm (frames per minute) which would translate to 0.032 fps (frames per second). If you want to achieve real-time with 24 fps you would need at least 746 times the computing power on a single machine, which is not very practical in a real world scenario. Clearly we need another approach to encode with reasonable speeds, especially when it comes to live streaming.
Thanks to our chunk-based encoding approach that allows us to scale a single encoding among multiple instances we can encode AV1 with reasonable turnaround times and it’s also possible to use AV1 for livestreams. Our chunked encoding allows us to speed up the encoding almost linearly with the number of instances that are added to the encoding cluster and this approach works with our cloud encoding the same way it works with our on-premise setups that are based on Kubernetes and Docker. Consequently, we can reach the same encoding speeds for AV1 that our customers have come to expect for H264, VP9 and HEVC encoding, which makes the codec effectively usable for media companies and content providers throughout the industry.
We also encoded the ToS teaser with our AV1 encoder in the cloud with the default configuration where we achieved 7 fps, which is about 219 times faster than what was achieved in the test with the Lenovo notebook. This is already pretty impressive however, we were not satisfied with the speed as it was still below realtime. So we tried with an enterprise set-up by just adding more instances to the encoding process. The resulting encoding speed was at 36 fps, which is about 1125 times faster than with the single Lenovo notebook.
In addition, we don’t have to compromise on quality for speed because our encoder does not need to sacrifice quality to reach a certain speed on a single instance as other encoding vendors typically do. With our approach we are not bound to the hardware restrictions of a single instance, we can add more instances to an encoding cluster to generate the quality that our customers have configured in a reasonable time or in realtime for live streams. With our chunk-based implementation of the AV1 video codec we can encode videos with AV1 even faster than realtime without compromising quality.
World First AV1 Livestream Demonstration
We will also showcase the first ever AV1 livestream, delivering 1080p playback at 1.5Mbps in broadcast quality at our booth at the NAB Show in Las Vegas (SU9007CM) from April 24-27. Currently you would need around 4 to 15 Mbps with traditional codecs like H264 to deliver the same quality. So AV1 could reduce your CDN and storage cost by up to 10x.
The setup of our AV1 live workflow that we will showcase consists of the following components:
- OBS RTMP mezzanine stream, 12Mbps 1080p@30fps
- Bitmovin Distributed AV1 Cloud Encoder running in Google Cloud receives an RTMP ingest and transcodes to 1.5Mbps 1080p@30fps segmented WebM. Segments will be directly transferred to a Google Cloud Storage bucket.
- The Bitmovin Distributed AV1 Cloud Encoder also generates HLS and MPEG-DASH manifests that will be transferred to the Google Cloud Storage bucket. Enabled experiments of the AV1 codec are: adapt_scan, ref_mv, filter_7bit, reference_buffer, delte_q, tile_groups, rect_tx, cdef
- Native playback on a desktop with a Bitmovin Player based on aomdec and ffplay
Our AV1 encoder generates WebM segmented output that could be used with HLS or MPEG-DASH for VoD and Live. However, as AV1 is currently not supported by any browser, we had to write our own player that is able to playback our AV1 livestream. We updated the aomdec application to be able to download and decode the AV1 chunks which can be seen in the left console window. Fortunately, decoding is not as resource intensive as the encoding, which allows you to decode the AV1 stream on normal hardware without special requirements, e.g., the same Lenovo notebook (i7-4800MQ, 8GB RAM running Ubuntu 14.04) that was not capable of encoding this video just near to realtime could easily playback AV1 in software. After the decoding step, we pipe the decoded YUV frames to ffplay to display the stream in a window as you see in the screenshot above. We plan to contribute this functionality back to aomdec after a technical cleanup of the current implementation.
A Practical Quality Comparison
Although the bitstream from AV1 is not finalized yet and much work needs to be done to further improve the quality of the codec, we wanted to get a snapshot of the current state and compare its quality with AVC/H264, HEVC/H265, and VP9. For that purpose we made two different quality comparisons, the first one with two objective metrics, PSNR and SSIM. PSNR does not always correlate well with perceived quality, but is the de-facto standard for video quality comparisons. SSIM is a perception-based quality metric which should give better results in regard to the perceived quality.
For the second comparison we chose to make a side-by-side quality comparison between AV1 and the other codecs. This quality comparison targets a practical use case where the resulting content can be used for Adaptive Bitrate Streaming (ABR). Therefore we have used a fixed Group of Pictures (GOP) size for our experiments and also used Variable Bitrate (VBR) encodings with a target bitrate. This approach is established in the industry but results can vary from scientific evaluations that purely target abstract use cases and theoretical encoder performance through the HM (HEVC reference software) and JM (AVC reference software) reference softwares that have no practical relevance in the industry.
Let’s first start with the objective quality comparison with PSNR. We encoded the open-source movie Sintel from the Blender Foundation with VBR to the following target bitrates: 100Kbps, 250Kbps, 500Kbps, 1Mbps, 2Mbps, 4Mbps and calculated PSNR and SSIM for the bitrate that has actually been achieved by the individual codec (typically codecs in VBR mode do not hit the target bitrate exactly).
The following encoding settings for the different codecs were used in the Bitmovin Encoding Service:
GOP Size: 96 frames (4 seconds), Me_range: 16, Cabac: true, B-Adapt: 2, Me: UMH, Rc-Lookahead: 50, Subme: 8, Trellis: 1, Partitions: All, BFrames: 3, ReferenceFrames: 5, Profile: High, Direct-Pred: Auto
GOP Size: 96 frames (4 seconds), Sao: 1, B-Adapt: 2, CTU: 64, Profile: Main, BFrames: 4, Rc-Lookahead: 25, WeightP: 1, MeRange: 57, Ref: 4, Subme: 3, Tu-Inter-Depth: 1, Me: 3, No-WeightB: 1, Tu-Intra-Depth: 1
GOP Size: 96 frames (4 seconds), Cpu-used: 1, Tile-columns: 4, Arnr-Type: Centered, Threads: 4, Arnr-maxframes: 0, Quality: Good, Frame-Parallel: 0, AQ-Mode: none, Arnr-Strength: 3, Tile-Rows: 0
Build f3477635d3d44a2448b5298255ee054fa71d7ad9, Enabled experiments by default: adapt_scan, ref_mv, filter_7bit, reference_buffer, delte_q, tile_groups, rect_tx, cdef
Passes: 1, Quality: Good, Threads: 1, Cpu-used: 1, KeyFrame-Mode: Auto, Lag-In-Frames: 25, End-Usage: VBR
The above diagram clearly shows that AV1 already outperforms all the other codecs for each bitrate setting. For bitrates from 1Mbps and higher the quality difference is already pretty big (> 0.5db which is usually clearly visible). VP9 and HEVC/H265 are very similar from an PSNR perspective, however, VP9 was the codec that overshot the target bitrate by far the most.
We also compared the four codecs with SSIM. The results can be seen in the above diagram and are quite similar to PSNR with some slight differences. AV1 is still the best performing codec over all bitrates, and AVC/H264 lags behind. However, interestingly AVC/H264 catches up with increased bitrate. An explanation for that could be that in the higher bitrates we can reach nearly the quality of the source material with all codecs, which results in only minor differences between the codecs.
Additionally we created several side-by-side quality comparisons where we experimentally changed the target bitrate for each codec to reach an average of 500 Kbps. Below you can see the quality comparisons between the encodings comparing the quality of Bitmovin AV1 with AVC/H264, HEVC/H265, and VP9. We used the well known Tears of Steel teaser that is 40 seconds long with a 1080p resolution for the comparison, selecting a complex scene that is hard to encode.
When comparing AV1 with AVC/H264 the quality difference is very obvious as expected. We can clearly see multiple encoding artifacts and blocking in the right part of the image that has been encoded with AVC/H264. In contrast, the left part encoded with AV1 looks much cleaner without obvious encoding artifacts.
Looking at the quality difference between AV1 and VP9 it is not as obvious as with AVC/H264, but still quite visible. Especially the borders of the tiles of the sphere show encoding artifacts and the overall picture in VP9 seems to have quite some noise. We can also identify some blocking artifacts that are not visible in AV1.
HEVC/H265 visually looks a bit better than VP9, however, it still has visible encoding artifacts, especially in the lower part of the image and around the arm of the guy with the red coat. When we look closely at the arm we can see that the color is not encoded as nicely as with AV1 and shows some noise.
You are probably asking yourself why we already support AV1 when there are currently so few use cases. Bitmovin’s culture and vision has always been to be a technology leader and our passion for video means we consistently tackle the most complex video problems. Why? Because it’s fun and challenging and our team loves a challenge! 😉
Beside that, there are already use cases for AV1 where you could use it as your mezzanine format to preserve a high quality version of your video at a low bitrate that can later on be used to create your adaptive bitrate renditions or other formats. Using AV1 for that use case would decrease your storage footprint and speed up transfer times inside of your datacenter or for upload to the cloud.
Furthermore, with the companies behind AOMedia, like AMD, ARM, Intel, NVIDIA, Google, Microsoft, Mozilla, Netflix and Amazon, it should not take too long to get broad support for AV1. AMD, Intel and NVIDIA cover the desktop market quite nicely and ARM and Intel the mobile market. Additionally, the major browser vendors, Google, Microsoft and Mozilla will make sure that the codec finds its way into the browsers soon after the bitstream freeze. Google, Netflix and Amazon will make sure that AV1 content will be available quickly and that will further drive adoption and hardware support. The AV1 bitstream freeze should be in Q3 2017 and from there everything should move quickly. With Bitmovin’s solutions you are prepared for that move and we will not stop innovating 😉