This is the second part in a three part series on high quality video streaming, evaluating the performance of MPEG-DASH against other adaptive streaming technologies.
In the first part of this blog post we provided the basics about Quality of Experience (QoE) with MPEG-DASH. In the second part we will provide details about the actual evaluation. In the meantime, real-time entertainment traffic (i.e., streaming of audio and video) has increased to more than 70% at peak periods.
Other posts in this series:
- Part 1. In the first part of this series we explained the term Quality of Experience and how it is measured
- Part 3. This post describes the results and conclusions that can be derived from our findings.
The test sequence is based on the DASH dataset where we adopt the Big Buck Bunny sequence that we encoded with Bitmovin’s Cloud Encoding Service in order to get the representations with a bitrate of 100, 150, 200, 350, 500, 700, 900, 1100, 1300, 1600, 1900, 2300, 2800, 3400, 4500 kbps and resolutions ranging from 192×108 to 1920×1080. The configuration provides a good mix of resolutions and bitrates for both fixed and mobile network environments. In fact, we provide two versions, one with a segment length of 2s and the other with 10s that are the most common segment sizes currently adopted by actual deployments (i.e., Apple HLS uses 10s whereas others like Microsoft and Adobe use 2s).
For the current MPEG-DASH client we have adopted Bitmovin’s Adaptive Streaming framework and compare it with ten different adaptation algorithms reported in the research literature.
For the objective evaluation we adopt a standard setup where the bandwidth and delay between a server and client are shaped using a shell script, that invokes the Unix program TC with netEM and a token bucket filter. In particular, the delay was set to 80ms and the bandwidth follows a predefined trajectory as depicted in Figure 2. The delay corresponds to what can be observed within long-distance fixed line connections or reasonable mobile networks and, thus, is representative for a broad range of application scenarios. The bandwidth trajectory contains both abrupt and step-wise changes in the available bandwidth to properly test all the adaptation logics under different conditions.
The goal of this evaluation setup is to provide objective metrics which are collected at the client to be analyzed during the evaluation. These metrics include the observed bitrate, selected quality representation, buffer level, start-up delay, and stalls (re-buffering due to underruns).
Evalutation of the Adaptive Streaming Performance
For the subjective evaluation we adopt a crowdsourcing approach that uses the Microworker platform – https://microworkers.com/ – to run such campaigns and to recruit participants, which are actually referred to as microworkers. The content server is located in Europe and, thus, we limit participants to Europe in order to reduce network effects due to proxies, caches, or content distribution networks (CDNs) that we cannot control.
At the end of the subjective evaluation, each microworker needs to provide proof that she/he has successfully participated, which is implemented using a unique identification number. We set the compensation to US$ 0.4, which is the minimum compensation for this type of campaign at the time of writing this article.
The stimulus is the same as for the objective evaluation but we added another sequence – an excerpt from Tears of Steel in order to mitigate any bias that may be introduced when using only one type of content. The content configuration is the same as for Big Buck Bunny but we used only one segment size of 2s.
In addition to the QoE rating, we gather various objective metrics such as number of stalls (i.e., buffer underruns) and the average media throughput of the client to gain further insights into the performance of each adaptive streaming client.
This methodology enables a subjective evaluation of different DASH adaptation logics within real-world environments as opposed to controlled environments and, thus, provides a more realistic evaluation of adaptive HTTP streaming systems. However, using crowdsourcing requires a more careful evaluation of the participant’s feedback. Therefore, we filtered participants using browser fingerprinting, stimulus presentation time, actual QoE rating, and feedback from the pre-questionnaire.