This blog post is continuation of an ongoing blog and webinar technical deep series. You can find the first blog post here and the associated webinar recording here. The first post covered the fundamentals of live low latency and defined chunked delivery methods with CMAF.
This blog post expands on chunked CMAF delivery by explaining it’s application with MPEG-DASH to achieve low latency. We’ll lay some foundations and cover the basic approaches behind low-latency DASH, then look into what future developments are expected as low-latency streaming is a heavily researched subject and is quickly becoming a media industry standard.
Basics of MPEG-DASH Live Streaming
Before diving into how Low Latency Streaming works in MPEG-DASH we first need to understand some basic stream mechanics of DASH live streams, most importantly, the concept of segment availability.
The DASH Media Presentation Description (MPD) is an XML document containing essential metadata of a DASH stream. Among many other things, it describes which segments a stream consists of and how a playback client can obtain them. The main difference between on-demand and live stream segments within DASH is that all segments of the stream are available at all times for on-demand; whereas the segments are produced continuously one after another as time progresses for live-streams. Every time a new segment is produced, its availability is signaled to playback clients through the MPD. It is important to note that a segment is only made available once it is fully encoded and written to the origin.
The MPD would specify the start of the stream availability (i.e. the Availability Start Time) and a constant segment duration, e.g. 2 seconds. Using these values the player can calculate how many segments are currently in the availability window and also their individual availability start time. For example, the segment availability start time for the second segment would be AST + segment_duration * 2.
Low Latency Streaming with MPEG-DASH
In the first part of this blog post series, we described how chunked encoding and transfer enables partial loads and consumption of segments that are still in the process of being encoded. To make a player aware of this action, the segment availability in the MPD is adjusted to signal an earlier availability, i.e. when the first chunk is complete. This is done using the availabilityTimeOffset in the MPD. As a result, the player will not wait for a segment to be fully available and will load and consume it earlier.
Consider the example of Fig.1 with a segment duration of 2 seconds and a chunk duration of 0.033 seconds (i.e. one video frame duration with 29.97 fps). To signal the segment availability once the first chunk is completed we would set the availabilityTimeOffset to 1.967 seconds (segment_duration – chunk_duration). This would signal the greyed-out segment in Fig. 1 to become partially available.
The below MPD represents this example:
<?xml version="1.0" encoding="utf-8"?> <MPD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:xlink="http://www.w3.org/1999/xlink" xsi:schemaLocation="urn:mpeg:DASH:schema:MPD:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd" profiles="urn:mpeg:dash:profile:isoff-live:2011" type="dynamic" minimumUpdatePeriod="PT500S" suggestedPresentationDelay="PT2S" availabilityStartTime="2019-08-20T05:00:03Z" publishTime="2019-08-20T12:42:07Z" minBufferTime="PT2.0S"> <Period start="PT0.0S"> <AdaptationSet contentType="video" segmentAlignment="true" bitstreamSwitching="true" frameRate="30000/1001"> <Representation id="0" mimeType="video/mp4" codecs="avc1.64001f" bandwidth="2000000" width="1280" height="720" <SegmentTemplate timescale="1000000" duration="2000000" availabilityTimeOffset="1.967" initialization="1566277203/init-stream$RepresentationID$.m4s" media="1566277203/chunk-stream_t_$RepresentationID$-$Number%05d$.m4s" startNumber="1"> </SegmentTemplate> </Representation> </AdaptationSet> </Period> </MPD>
To recap, for low-latency DASH we are mainly doing two things:
- Chunked encoding and transfer (i.e. chunked CMAF)
- Signaling early availability of in-progress segments
While the previous approach enables a basic low-latency DASH setup, there are additional considerations to be made to further optimize and stabilize streaming experience. The DASH Industry Forum is working on guidelines for low-latency DASH to be released in the next version of the DASH-IF Interoperability Points (DASH-IF IOP) – expected in early July 2020. The change request for that can be found here. The following will explain key parts of these guidelines. Please note that some features were not officially finalized and standardized at the time of this post’s publication (June 2020).
Wallclock Time Mapping
For the purpose of measuring latency, a mapping between the media’s presentation time and the wall-clock time is needed. This is so that for any given presentation time of the stream the corresponding wall-clock time is known. The latency for a given playback position can then be calculated by determining the corresponding wall-clock time and subtracting it from the current wall-clock time.
This mapping can be achieved by specifying a so-called Producer Reference Time either in the segments (i.e. inband as prft box) or in the MPD. It essentially specifies the wallclock time at which the respective segment/chunk was produced. (as seen below)
<ProducerReferenceTime id="0" type="encoder" presentationTime="538590000000" wallclockTime="2020-05-19T14:57:45Z"> </ProducerReferenceTime>
The type attribute specifies whether the reference time was set by the capturing device or the encoder. Allowing for calculation of the End-to-End Latency (EEL) or Encoder-Display Latency (EDL), respectively.
Client Time Synchronization
A precise time/clock at the playback client is necessary for calculations that involve the client’s wallclock time such as segment availability calculations and latency calculations. It is recommended for the MPD to include a UTCTiming element which specifies a time source that can be used to adjust for any drift of the client clock. (as seen below)
<UTCTiming schemeIdUri="urn:mpeg:dash:utc:http-iso:2014" value="https://time.akamai.com/?iso" />
Low Latency Service Description
A ServiceDescription element should be used to specify the service provider’s desired target latency and minimum/maximum latency boundaries in milliseconds. Furthermore, playback rate boundaries may be specified that define the allowed range for playback acceleration/deceleration by the playout client to fulfill the latency requirements.
<ServiceDescription id="0"> <Latency target="3500" min="2000" max="10000" referenceId="0"/> <PlaybackRate min="0.9" max="1.1"/> </ServiceDescription>
In most player implementations such parameters are provided externally using configurations and APIs.
The previous post pointed out that chunked delivery decouples the achievable latency from the segment durations and enables us to choose relatively long segment durations to maintain good video encoding efficiency. In turn, this prevents fast quality adaptation of the player as quality switching can only be done on segment boundaries. In a low-latency scenario with low buffer levels, fast adaptation — especially down-switching — would be desirable to avoid buffer underruns and consequently playback interruptions.
To that end, Resync elements may be used that specify segment properties like chunk duration and chunk size. Playback clients can utilize them to locate resync point and
- Join streams mid-segment, based on latency requirements
- Switch representations mid-segment
- Resynchronize at mid-segment position after buffer underruns
The previous was a glimpse of what to expect in the near future and shows the great effort of the media industry put into kick-starting low-latency streaming with MPEG-DASH and getting it ready for production services.
Want to learn more? Check out some of the supporting documentation below:
[Tool] DASH-IF Conformance Tool