Welcome back to our Encoding Excellence series! Our first post covered why Quality of Experience (QoE) in Content Matters; specifically: why it’s important to monitor and maintain QoE to avoid revenue loss and the responsibility of video encoders to provide the clearest possible images at the most efficient bitrates.
This blog is the second post of our Encoding Excellence series and is a deeper dive into QoE; Here we’ll define the objective methods by which you can measure and assess quality. This includes the quality scoring metrics used by industry experts, types of encoding solutions on the market, and last but not least, Jan Ozer’s Whitepaper on how to choose the Best Per-Title Encoding Technology.
(State-of-the-Art) Quality Assessment Scoring Methods
A viewer’s rating of quality is important for experience metrics, however, it’s not a reliable measurement of quality for production and distribution stakeholders such as service providers or network operators. A standard consumer can provide valuable insight into subjective quality assessment, but subjective measurements often lack scientific objectivity and scalability. For this reason we focus on repeatable and objective quality measurement methods. Although there are many measurements, Bitmovin has three identified primary methods “plus” an extension that are ideal for objective quality assessments:
- Video Multi-Method Assessment Fusion (VMAF): One of the latest metrics adapted by the streaming community is Netflix’s Video Multi-method Assessment Fusion (VMAF). It predicts subjective video quality based on reference and distorted video sequences. The metric can be used to evaluate the quality of different video codecs, encoders, encoding settings, or transmission variants.
- Structural Similarity (SSIM): The structural similarity index is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. As its name indicates: SSIM is used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as a reference. SSIM is designed to improve on traditional methods such as peak signal-to-noise ratio (PSNR) and mean squared error (MSE).
- SSIMPLUS: The SSIMPLUS index is based on SSIM and is a commercially available tool. An extension of SSIM, it acts as a structural comparison of target video applications and assigns scores between 0–100, linearly matched to human subjective ratings. SSIMPLUS also adapts assigned scores by intended viewing devices; thereby comparing video across different resolutions and contents. According to its authors, SSIMPLUS achieves higher accuracy and higher speed than other image and video quality metrics. However, no independent evaluation of SSIMPLUS has been performed, as the algorithm is not publicly available.
- Peak Signal-to-Noise Ratio (PSNR): The Peak Signal-to-Noise Ratio is a video quality metric that combines human vision modeling with machine learning. PSNR is most commonly used to measure the quality of reconstruction of lossy compression codecs (e.g., for image compression). When comparing codecs, PSNR is an approximation of the human perception of reconstruction quality. Generally speaking, a higher PSNR indicates that the reconstruction is higher quality, in some cases it may not. However, in cases where the codecs and/or content are different, the validity of this metric can vary greatly; therefore, you should be extremely careful when comparing results.
PSNR is a well-established method especially in research and also within the industry when measuring at scale mainly given its simplicity. But it is also known to not correlate well to human perception. Other metrics, such as SSIM or VMAF better correlate with subjective scores but can be quite expensive and time-consuming to compute. In the end, it is up to an individual encoding professional to decide which method to use testing video quality. There are additional quality metric scores such as MOS and DMOS, and the industry is still in the process of determining the gold-standard of objective quality measurement.
With objective quality assessment tools in hand, it is now easier for you and encoding professionals to evaluate and select which encoding technology will best suit your content delivery (and quality) needs. So which technologies currently exist in the market and what are the ways that to encode (especially for quality)?
Codec Support Selection
- H.264/AVC: The industry standard for video compression – designed and maintained by the Moving Pictures Expert Group (MPEG) since 2003. According to our 2019 Video Developer Report – is currently used by over 90% of video developers
- H.265/HEVC: MPEG’s most recent video compression standard – designed and maintained since 2013 – offers 50% higher efficiency than its predecessor
- VP9: Google’s royalty-free video compression standard – designed and maintained since 2013. Mostly used on YouTube, but otherwise does not offer full device reach. Great for saving on CDN/Bandwidth costs!
- AV1: Another open-source code, designed by the Alliance for Open Media (AOMedia), a conglomerate of video tech giants like Google, Facebook, Netflix, Amazon, Windows, and Bitmovin. AV1 offers 70% better compression rates than H.264 but is currently limited by application within browsers and devices. However, according to the 2019 Dev Report, this is slated to change within the coming two years.
- Next generation codecs: Keep an eye out for the next round of MPEG codecs VVC, EVC, and LCEVC which are scheduled for release in 2020.
Benefits of single codec usage vs Multi-codec support
Even though H.264 is ubiquitous and widely supported at the hardware level, it is much less efficient than next-generation codecs in terms of compression rate. By encoding your videos using a multi-codec approach you can aim to double the quality while still reducing your bandwidth consumption without compromising device reach.
Ways to perform an encode
Now that you’ve selected which codecs you’ll be using, the next step is to determine how you’ll encode your video content. To retain the best quality during conversion, we’ve determined that multi-pass encoding is the best option; below you’ll find the types of multi-pass encodes:
2-Pass Encoding: A file is analyzed thoroughly in the first pass and an intermediate file is created. In the second pass, the encoder looks up the intermediate file and appropriately allocates bits, therefore, the actual encoding takes place during the second pass.
3-Pass Encoding: Similar to 2-Pass encoding, 3-Pass encoding analyzes the video three times from the beginning to end before the encoding process begins. While scanning the file, the encoder writes information about the original video to its own log file and uses that log to determine the best possible way to fit the video within the bitrate limits the user has set for the encoding process.
Per-Title Encoding: A form of encoding optimization that customizes the bitrate ladder of each video based on the complexity of the video file. The ultimate goal is to optimize towards a bitrate that provides just enough room for the codec to encapsulate information to present a perfect viewing experience. Another way to consider it, is that the optimized adaptive package is reduced down to contain the exact information for optimal viewing quality. Anything beyond the human eye’s ability to perceive is stripped out. (Test your content and see a comparison of your existing bitrate ladder against the optimized Per-Title Ladder)
Some of the true magic behind encoders is their ability to choose how to implement or tune a given codec. In some cases, encoders allow users to configure and optimize codec compression settings, like motion estimation or GOP size and structure. It goes without saying, but the best method to ensure top quality, even through an encode, is by supplying high-quality sources; starting with pristine quality videos and using best practices for signal acquisition/contribution.
Choosing the Best Per-Title Encoding Technology – Jan Ozer
For a deeper dive into per-title encoding and to see how Jan Ozer, a leading expert on encoding for live and on-demand production, rates the different per-title technologies, download his comprehensive analysis here.
The question of how “objective” these scores are, is an entirely different topic that’s covered in the next Encoding Excellence blog, Objectionable Uses of Objective Quality Metrics. A topic Bitmovin’s solutions architect and encoding guru Richard Fliam tackled at Demuxed 2019.
Sign-up for our live Webinar, Objective Video Quality: Measurements, Methods, and Best Practices, ft. Jan Ozer & SSIMWave Solutions Architect, Carlos Bacquet, on November 13th at 8am PST (17:00 CET)