By encoding video at bitrates appropriate to the content of the video file, content providers can make significant bandwidth savings as well as quality improvements.
Per-Title Encoding is not a new concept. In fact you can find research online that dates back several years, including this 2012 presentation: Choosing the Segment Length for Adaptive Bitrate Streaming and the 2011 paper: Dynamic Adaptive Streaming over HTTP Dataset, both from our own co-founders. Most of the early research concluded that Per-Title Encoding was effective in test environments, but was not suitable for commercial application because it did not work with a fixed bitrate ladder. This is due to the fact that every piece of content is different, and so each video would require individual analysis as a first step.
In 2015 Netflix managed to mitigate the overhead of the extra analysis step and implement Per-Title Encoding at scale. As a result, they increased the quality of experience and achieved significant bandwidth savings. These optimizations are achieved by increasing or decreasing the bitrate of each bitrate ladder entry based on a complexity measurement for each input file. It sounds simple enough, but believe me, there is a good reason that Netflix took years to make Per-Title Encoding a viable part of their video delivery workflow.
But to fully understand the complexity of the challenge, it’s best to start at the beginning:
What is Per-Title Encoding?
Put simply, it’s a form of encoding optimization that customizes the bitrate ladder of each video, based on the complexity of the video file itself. The ultimate goal is to select a bitrate that provides enough room for the codec to encapsulate enough information to present a perfect viewing experience, but no more. Another way of thinking about it is that the optimized adaptive package has been reduced down to just the information that viewers can actually enjoy. Anything beyond the human eye’s ability to perceive is stripped out.
This “optimal” bitrate varies for each type of content, and even from title to title. Action scenes or sports scenes typically require a higher bitrate to store the information, as they contain a lot of motion and less redundancies making each scene more complex. Therefore they also contain less opportunities to compress data, without impacting the perceived quality of the content. On the other hand, documentaries typically have far less motion during any given scene, which leaves a codec more possibilities to compress the given information more effectively without losing perceptual quality. If you take those characteristics and adjust the encoding profile accordingly, you are able lower the bitrate but still maintain a very good perceived quality for your content.
In order to decide which bitrate fits best for each specific piece of content, you need a good quality metric to measure against. Gathering this information typically requires several encodings where different types of content, all of which need to be encoded with a variety of different bitrate settings. Once that is complete a PSNR (Peak to Signal Noise Ratio) analysis of each encoding needs to be performed to form an objective impression of the effectiveness of these encoding parameters.
Quick Fact #1: If you compare an encoding with a PSNR of 45 dB or higher with its source video, you won’t notice any difference although less information is used to render this content. On the other hand, a PSNR of 35 dB or lower would definitely show noticeable differences between the encoding and its source file.
Based on the results of this analysis, you can derive a custom bitrate ladder to encode each content file accordingly. This approach works, and will result in an improved quality of experience and in the vast majority of cases, a reduction in bandwidth usage as well. This is of course of paramount importance to most online content providers, VoD and OTT platforms in particular.
This is OK for a start, but as you apply this optimization to a large number of titles, you will begin to see limitations in the PSNR metric. An improved method for analysing the visual quality of an image is the Structural SIMilarity (SSIM) index. SSIM is a method for measuring the similarity between two images. Essentially one image is taken as a control, and considered “perfect quality” and the second image in them compared against it and is a useful method for measuring the results of your optimization. SSIM is a perception-based model and is focused on changes (structure, luminance) that impacts the perceived quality. This provides an improved impression about the quality of our content compared to PSNR, but is also a bit more compute intensive.
How you can do it with Bitmovin?
Let’s quickly recap what we have learnt until now. While a fixed bitrate ladder is not ideal for every type of content, being able to create an optimal bitrate ladder for each and every encoding is very time consuming and expensive. So we need a way to efficiently adapt a given bitrate ladder to the complexity needs of the content.
As shown in the figure above, the first step of a Bitmovin Per-Title encoding is to compute a “complexity factor” for a given input. With our API, this is done during the “complexity analysis” of your input file. H264 provides an option called CRF (Constant Rate Factor). This factor specifies a “quality level”, which is achieved by varying the bitrate based on the amount of motion detected in the video content. The average bitrate of a CRF encoding, allows us to get an impression of the overall complexity of this video asset and to derive a “complexity factor” from it.
Quick Fact #2: The complexity factor has a value range of 0.5 to 1.5. Content which has a complexity factor between 0.5 and 1 is considered to be less complex, while a complexity factor between 1 and 1.5 relates to content with a higher complexity.
Adjusted Encoding Profile
With this “complexity factor” we can adjust the given bitrate ladder. Keep in mind that we do not just consider the “complexity factor” but also the resolution of the bitrate ladder entry we want to adjust.
For low complexity content we can reduce the higher bitrate levels to a higher degree, without losing on visual quality and also gain the most from the bitrate savings. The lower bitrate levels are adjusted as well, but not as significantly as the higher bitrate levels to avoid degradation of visual quality.
For high complexity content it basically works the other way around. We do not adjust the high bitrate levels too much as a significant increase in bitrate would usually not gain much on visual quality. However, low bitrate levels are increased to a higher degree because adding bitrate we can significantly increase visual quality.
Why is that so? Modern codecs work more efficiently on larger resolutions, because they often contain bigger uniform areas, which can be compressed more effectively. Because of that, less bits are needed per pixel to achieve a similar quality with larger resolutions compared to smaller resolutions.
ABR encoded content
The following results show how this approach actually works. Less complex input files have bigger adjustments of their upper bitrate ladder entries and smaller ones for the lower values. Although the bitrate was reduced by 30%, the PSNR and SSIM stayed almost the same, so your customers would experience the same quality using less bandwidth, so your distribution costs would be reduced. This can be seen for the Cartoon “Glass Half”. Its PSNR value dropped from 51.51 dB to 47.78 dB, however it is still a very good quality, and your customers won’t notice because the differences are too small for the human eye to perceive. (remember Quick Fact #1).
Highly complex videos on the other hand also show the expected behavior. While upper bitrate ladder entries are adjusted less, the lower bitrate ladder entries were increased accordingly in order to be able to still achieve an improved level of quality, which wouldn’t be the case if a fixed bitrate ladder would have been used.
All of these example encodings with playable comparisons are available to view on our demonstration page.
Can this workflow be optimized further?
Of course, there is always room for further improvements and optimizations, but there is are tradeoffs between efficiency and costs. Given a defined set of resolutions, the best possible bitrate ladder can be found by encoding the input file with a different set of bitrates for each resolution and performing a PSNR analysis for each of those. Those results will show us, which bitrate (x axis) provides the best possible quality (y axis) for a specific resolution. This bitrate is at the apex of each blue line, shown in the graph below. Based on that, you get a convex hull, which is then used, to select the pairs of resolution and bitrate which fits best for your encoding and its bitrate ladder.
Although this allows us to define an “optimal” bitrate ladder for a particular video, it requires several encodings (e.g. 5 encodings with a different bitrate per rendition, and 5 renditions = 25 encodings to determine the final bitrate ladder). It isn’t guaranteed that those encodings are sufficient to determine a bitrate ladder, because you can’t tell beforehand if your range of bitrate is sufficient to determine the quality behavior of a resolution. This makes this approach more expensive also and increases the time it takes to determine the bitrate ladder.
In our described workflow, we are using one CRF encoding to evaluate the complexity factor of one low resolution encoding to adjust the bitrate of all entries in our bitrate ladder. Creating a CRF encoding for each resolution, would allow us to adjust the bitrate more specifically and would lead to further quality improvements and/or bandwidth savings.
Another optimization would be to specifically analyze your set of input files to evaluate their characteristics, and calculate an optimized complexity factor, which represents the complexity of your input files more precisely. These results can also be used with a machine learning approach.
Another simple optimization would be to take the specific details of the input file as a reference and adjust the bitrate ladder accordingly. This can be done already using Stream Conditions, so codec configurations can be skipped if they don’t meet certain conditions, e.g. that the input file resolution needs greater or equal than the width/height configured in the codec configuration. In this example the conditions avoid any possibility of upscaling, which goes hand in hand with losing quality.
Even though optimizing your content requires a little extra processing in the form of trial encodings, it is definitely worth it. The relatively small increase in encoding cost is easily outweighed by the bandwidth savings and the overall improvement in customer experience.
For more information, reach out to our solutions team for a demonstration or find out more about how Bitmovin can help you to solve complex problems in your video workflow.