A.I. Encoding Uses Machine Learning to Speed Up Processing and Improve Quality

Martin Smole
. 4 min read
AI 3-Pass Encoding is a new level for video quality

Machine learning in Video Encoding

A.I. Encoding workflow running on a containerized “chunk-based” infrastructure with ML-based machine learning model delivers industries highest quality video encoder.

At Bitmovin, we have been known to push the boundaries for exceptional quality in delivering video content. From our efforts as an early mover in the implementation and development of AV1 to our cloud-native solution to video encoding, we have been working and researching all parts of the workflow in order to be able to provide great quality streaming, while still reducing bandwidth consumption and file sizes.
The A.I. Encoding is another major step in our video technology research and development efforts.  With the introduction of machine learning, the encoder can make smart decisions about compression settings and visual parameters of each frame, speeding up processing and improving encoding effeciency. The encoder performs a detailed video analysis and machine learning algorithm improves over time, continuously optimizing the encoding parameters. First, let’s talk about the mechanics of A.I. in video encoding.

The 3 passes

AI Chunk-based 3-pass encoding workflow
During the first pass, the entire video file is scanned superficially, meaning that property information, which does not require more in-depth analyses (e. g. motion predictions), is extracted and collected. The data gathered is then entered into an encoding engine, which uses artificial intelligence to produce optimized encoding parameters. Those settings are tuned to content information such as a broad estimate of content complexity, which is easily obtainable and provides an initial level of optimization. Thanks to the AI aspect of the algorithm, the system improves progressively, as it obtains more and more information on which settings deliver high quality results. During the encoding process, the result is checked against objective quality metrics and the results are entered into the AI. As the AI’s database of encoding settings and accompanying results keeps growing, so does the quality of the matching encoding parameters and file attributes.
By the second pass, the encoding parameters for a chunk have been set, and the next step is to distribute the chunk to a specific processing instance based on factors such as complexity. The idea is to get precise data on each chunk to properly attribute resources based on the level of complexity. Following completion of the second pass, the results of both passes are then merged to obtain the necessary information for the encoder to achieve the best possible result.
The third pass basically constitutes the actual encoding process. Using a complex algorithm, the data gained from the analyses in the first two passes is used make a variety of encoding decisions, eventually resulting in an optimum quality at maximum bandwidth efficiency.

What exactly is a chunk?

- Bitmovin
The machine learning aspect is an essential part of the procedure, but the “chunk” part in our chunk-based 3-pass encoding routine is just as important. In most encoding solutions, “chunking” means breaking up the video content into segments purely based on time intervals. Following the conclusion of each chunk (e. g. 4 seconds of video), the encoding quality can be adjusted for the next segment. We’ve developed our own encoding logic which creates more coding-efficient chunks and therefore allows for more effective quality adjustments during the process. This results in drastic improvements in perceived quality at the same or even lower bitrates.

A tried and tested algorithm that also learns in the process

Our system grows smarter with each encoding sequence. The AI was trained initially using a large test library consisting of encodes and associated objective quality metrics. Based on the results stored in the database, the AI engine calculates the ideal encoding setting for each individual video, by matching it with similar content and the corresponding results from objective quality metrics testing.
Various objective quality metrics are used in the process. They all work somewhat differently and factor in other aspects, but they also share one common premise: these evaluations try to emulate the way the human eye perceives quality in order to achieve quantifiable results that directly correspond to the human experience. Although this may sound hard to believe to someone new to the subject, these algorithms have grown to become incredibly accurate over the past decade and are capable of delivering convincing analyses. The benefit lies in amassing effectively comparable data, which in turn translates to a highly efficient encoding process.
Another key advantage of applying machine learning to the encoding stems from the ability to adapt to broader scale changes in content technologies. As content in 4K or even 8K resolutions, HDR and wide color gamut becomes more common, the machine learning engine receives more and more input, which eventually allows it to adapt to these technologies. And presumably, as the engine potentially draws new input from every single encoding procedure, it will be able to do so way faster than any human-controlled testing of encoding settings could.

Substantially better quality at lower bitrates – a competitive edge for content providers

With Bitmovin’s introduction of our chunk-based 3-pass encoding scheme, we can confidently claim to outperform most other encoding providers when it comes to providing high quality video encoding at low bitrates. Our 3-pass encoding process delivers unparalleled results and performs well with very complex high resolution content as well as with highly compressed content at low bitrates.
Raising the bar on video quality is a key priority for the future of video content delivery. As bandwidth consumption keeps growing alongside consumer demands for high quality streaming content, the necessity for big leaps in encoding technology will soon become a pressing matter. With our portfolio of solutions, which targets all key points in the video delivery cycle – encoding, storage, CDN, player and analytics – we are perfectly equipped to rise to the coming challenges.
Talk to one of our experts today  and learn what chunk-based 3-pass encoding can do for your content delivery network!
Have you already heard of per-title Encoding? By encoding video at bitrates appropriate to the content of the video file, content providers can make significant bandwidth savings as well as quality improvements. Learn more about it here


Martin Smole

Martin Smole

Senior Engineering Director

Martin is responsible for the development and operation of the Bitmovin Encoding products for VOD and Live. His teams work to enable complex video encoding workflows for global premium media and technology companies like Red Bull Media House or the New York Times. He also manages collaborations with research partners like ATHENA to constantly incorporate new innovations to Bitmovins products.

Related Posts

Join the conversation