In continuation of our Fun with Container Formats series, this week we’ll be diving into the MP4 and CMAF container formats. If you need a refresher on the terminology or the handling of these formats in your player, please check back to Post 1: Fun with Container Formats
MP4
Overview of Standards
MPEG-4 Part 14 (MP4) is one of the most commonly used container formats and often has a .mp4 file ending. It is used for Dynamic Adaptive Streaming over HTTP (DASH) and can also be used for Apple’s HLS streaming.
MP4 is based on the ISO Base Media File Format (MPEG-4 Part 12), which is based on the QuickTime File Format. MPEG stands for Moving Pictures Experts Group and is a cooperation of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). MPEG was formed to set standards for audio and video compression and transmission. MPEG-4 specifies the Coding of audio-visual objects.
MP4 supports a wide range of codecs. The most commonly used video codecs are H.264 and HEVC. AAC is the most commonly used audio codec. AAC is the successor of the famous MP3 audio codec.
ISO Base Media File Format
ISO Base Media File Format (ISOBMFF, MPEG-4 Part 12) is the base of the MP4 container format. ISOBMFF is a standard that defines time-based multimedia files. Time-base multimedia usually refers to audio and video, often delivered as a steady stream. It is designed to be flexible and easy to extend. It enables interchangeability, management, editing and presentability of multimedia data.
The base component of ISOBMFF are boxes, which are also called atoms. The standard defines the boxes, by using classes and an object oriented approach. Using inheritance all boxes extend a base class Box and can be made specific in their purpose by adding new class properties.
The base class:
Example FileTypeBox:
The FileTypeBox is used to identify the purpose and usage of an ISOBMFF file. It is often at the beginning of a file.
A box can also have children and form a tree of boxes. For example the MovieBox (moov) can have multiple TrackBoxes (trak). A track in the context of ISOBMFF is a single media stream. E.g. a MovieBox contains a trak box for video and one track box for audio.
The binary codec data can be stored in a Media Data Box (mdat). A track usually references its binary codec data.
Fragmented MP4 (fMP4)
Using MP4 it is also possible to split a movie into multiple fragments. This has the advantage that for using DASH or HLS a player software only needs to download the fragments the viewer wants to watch. A fragmented MP4 file consists of the usual MovieBox with the TrackBoxes to signal which media streams are available. A Movie Extends Box (mvex) is used to signal that the movie is continued in the fragments. Another advantage is that fragments can be stored in different files. A fragment consists of a Movie Fragment Box (moof), which is very similar to a Movie Box (moov). It contains the information about the media streams contained in one single fragment. E.g. it contains the timestamp information for the 10 seconds of video, which are stored in the fragment. Each fragment has its own Media Data (mdat) box.
Debugging (f)MP4 files
Viewing the boxes (atoms) of an (f)MP4 file is often necessary to discover bugs and other unwanted configurations of specific boxes. To get a summary of what a media file contains the best tools are:
- MediaInfo (https://mediaarea.net/en/MediaInfo/Download)
- ffprobe, which is part of the ffmpeg binaries (https://ffbinaries.com/downloads)
These tools will however not show you the box structure of an (f)MP4 file. For this you could use the following tools:
- Boxdumper (https://github.com/l-smash/l-smash)
- IsoViewer (https://github.com/sannies/isoviewer)
- MP4Box.js (http://download.tsi.telecom-paristech.fr/gpac/mp4box.js/filereader.html)
- Mp4dump (https://www.bento4.com/)
(Screenshot of isoviewer)
CMAF
MPEG-CMAF (Common Media Application Format)
Serving every platform as a content distributor can prove to be challenging as some platforms only support certain container formats. To distribute a certain piece of content it can be necessary to produce and serve copies of the content in different container formats, e.g. MPEG-TS and fMP4. Clearly, this causes additional costs in infrastructure for content creation as well as storage costs for hosting multiple copies of the same content. On top of that, it also makes CDN caching less efficient. MPEG-CMAF aims to solve these problems, not by creating yet another container format, but by converging to a single already existing container format for OTT media delivery. CMAF is closely related to fMP4 which should make the transition from fMP4 to CMAF to be of very low effort. Further, with Apple being involved in CMAF, the necessity of having content muxed in MPEG-TS to serve Apple devices should hopefully be a thing of the past and CMAF can be used everywhere.
With MPEG-CMAF, there are also improvements in the interoperability of DRM (Digital Rights Management) solutions by the use of MPEG-CENC (Common Encryption). It is theoretically possible to encrypt the content once and still use it with all the different state-of-the-art DRM systems. However, there is no encryption scheme standardized and, unfortunately, there are still competing ones, for instance Widevine and PlayReady. Those are not compatible to each other, but the DRM industry is slowly moving to converge to one, the Common Encryption format.
Chunked CMAF
One interesting feature of MPEG-CMAF is the possibility to encode segments in so-called CMAF chunks. Such chunked encoding of the content in combination with delivering the media files using HTTP chunked transfer encoding enables lower latencies in live streaming uses cases than before.
In traditional fMP4 the whole segment had to be fully downloaded until it could be played out. With chunked encoding, any completely loaded chunks of the segment can already be decoded and played out while still loading the rest of the segment. Hereby, the achievable live latency is no longer depending on the segment duration as the muxed chunks of an incomplete segment can already be loaded and played out at the client.
Make sure to check back with us next week as dive into MPEG-TS and Matroska
Jump to Part 3 of Fun with Container Formats