One of the first questions when starting with adaptive streaming formats such as MPEG-DASH or HLS is how long do you generate the used media segments of the content. The segmentation of the content is necessary, as this enables the switching between the different video/audio qualities during the streaming session. The following figure gives a short overview of that process where multiple qualities of a video are encoded, chunked into segments, and requested by the streaming client/player.
However, the question for the optimal segment length is not easy to answer and depends on the environment (fixed access vs. mobile users), the content (premium vs. non-premium/UGC), e.g. short segments are good to adapt quickly to bandwidth changes and prevent stalls, but longer segments may have a better encoding efficiency and quality, and last but not least, also webserver/CDN configurations, such as enabled/disabled HTTP1.1/persistent connections.
So, let’s have a look at this topic in more detail: We did a detailed analysis of this topic based on different evaluations and datasets, which helps you to understand the influencing factors of the segment length decision and which provides you an indication of optimal segment lengths for your content and use case.
Typical DASH and HLS Chunk Sizes
For the following detailed evaluation of segment sizes, we created a dataset which is encoded and multiplexed using different segment sizes, ranging from 2 seconds (i.e., Microsoft Smooth Streaming) to 10 seconds per segment (recommended by Apple HTTP Streaming) with some steps in between and at the lower and higher end, which results in the sizes of the segments of 1, 2, 4, 6, 10 and 15 seconds, which we took as the basis for the following evaluations.
Segment Length Decision: Encoding Efficiency and Quality?
To enable seamless switching between the different quality representations of adaptive streaming formats such as HLS or DASH, it is required to maintain fixed I-frame positions in the video, e.g., after 48 frames, an I-frame has to be set in a 24 frames-per-second (FPS) video and a segment length of two seconds. This is necessary to guarantee I-frames at the beginning of each segment, which is needed to be able to switch representations between different segments. By doing so at the beginning of a new segment, the decoder does not need any references to previous frames or segments and therefore the new segment can have frames in different resolutions, bitrates, or framerates. Fixed I-frame positions can be achieved by restricting the group-of-picture (GOP) size of the encoder to the desired segment size of the content. As a consequence, from the encoding point of view, smaller segment sizes have a disadvantage because of the higher number of segments in the final encoding, and due to this, there are also more I-frames needed to guarantee representation switching at the segment boundaries. This leads to a lower encoding efficiency because I-frames, which cannot leverage temporal prediction, need more bits for encoding than predicted (P-) frames and so the overall quality of the content gets worse in comparison to conventional encoding at the same bitrate such as used for HTTP progressive download or segments with longer segment sizes. This problem is well-known and needs to be considered in content generation for adaptive HTTP streaming.
As a consequence of this lower encoding performance introduced by the fixed GOP sizes, the following evaluation demonstrates the effect of the different segment sizes on the encoding quality in terms of PSNR. This table shows the PSNR values for different segment sizes and provides evidence that this needs to be considered in the evaluation process for the segment sizes of adaptive HTTP streaming systems.
|1 sec. (24)||2 sec. (48)||4 sec. (96)||6 sec. (144)||10 sec. (240)||15 sec. (360)|
As shown, small segment sizes can affect the overall quality of the content by up to 1,5 dB of the PSNR value. However, the influence of this effect is reduced significantly by an increase in segment size. As shown in the following figure, segment sizes with lengths smaller than two seconds perform very poorly. In combination with other factors such as network characteristics shown in the following evaluation, such small segments (e.g. 1-second segment length) should generally be avoided.
Segment Length Decision: Avoiding Stalls, Streaming Performance and Web Server/CDN Configuration
From a network/internet perspective, there are also a lot of influencing factors that have to be considered. E.g. longer segment lengths may cause stalls using wireless internet connection with high bandwidth changes, but short segment lengths may result in poor streaming performance due to overhead produced by requests and the influence of the network delay. To investigate this, we built up an evaluation environment to emulate standard Internet connections, in order to show the impact of the segment size of adaptive streaming content, as well as other factors such as HTTP server configuration (e.g. allowing persistent connections). For this purpose, a standard HTTP Web server was used to enable persistent HTTP 1.1-compliant connections as well as non-persistent HTTP 1.0-compliant connections. We also emulated the network characteristics of a last-mile (e.g., ADSL) Internet connection and added a network delay of 150 ms for this evaluation.
The optimal segment size of the given network configuration scenario for both cases, with and without the usage of HTTP1.1/persistent connections, was evaluated. For this purpose, the performance results of the 1, 2, 4, 6, 10, and 15-second segment length versions of Big Buck Bunny of the dataset were analyzed and interpolated to a graph showing the performance of the segment sizes in terms of effective media throughput. As shown in the following figure, the optimal segment size for this network setting would be between 2 and 3 seconds if one uses web servers/CDNs using HTTP 1.1 persistent connections, and between 5 and 8 seconds without using them (e.g. using HTTP 1.0). The effective media throughput of the optimal segment lengths of both configurations differs only by about 50 kbit/s. The reason why the effective media throughput does not improve when increasing the segment size is that the available bandwidth in the evaluation setting changes over time. When longer segments are used, the client is not able to adjust as flexibly and quickly as it would be possible with shorter segments and therefore the overall bitrate deteriorates for longer segment lengths. On the other hand, the influence of the network delay (RTT) increases when using smaller segment lengths. This especially affects the non-persistent/HTTP1.0 connection results, because in this case there is one round-trip-time (RTT) required for establishing the TCP connection to the server after each segment. But also the persistent connection/HTTP1.1 results suffer from the influence of the delay when using very small segments, which is visible in the version with a segment length of one second in the following figure. In this case, half of the RTT necessary for requesting the segment becomes significant and the average throughput decreases.
Based on the results of these evaluations, as well as our experiences from customer deployments, Bitmovin would recommend using DASh or HLS chunk sizes of around 2 to 4 seconds, which is a good compromise between encoding efficiency and flexibility for stream adaption to bandwidth changes. It is also recommended to use Web servers and CDNs that enable persistent HTTP connections, as this is an easy and cost-effective way to increase streaming performance. Thus, in doing so, the effective media throughput and QoS can be increased without any changes to the client’s implementation, by simply choosing the right segment length.
We hope this blog post helps you when creating your content with the optimal segment length for your use case. If you have further questions on this, please do not hesitate to contact us. You can also have a look at our support section including tips on encoding, the Bitmovin Player in general and analytics.
Encode MPEG-DASH & HLS Content
Encode your content with the same technology as Netflix and YouTube in a way that it plays everywhere with low startup delay and no buffering with the Bitmovin Cloud Encoding Service.
Stefan from the Bitmovin Team!
[Free Download: Video Developer Report 2020 – Key insights into the evolving technology trends of the digital video industry]