Why Audio Encoding is Just as Important as Video Encoding

6 May 2021 . 7 min read

Audio Encoding_Sound Waves_Featured Image
When you think about streaming online content, you might be tempted to focus on the visual aspects, like a high bit rate or the latest codecs, but this is only half the battle for a superior video experience. The audio quality for any streaming video can be the difference between a good movie night for your clients and a bad one.
In this post, we’ll talk about how audio encoding affects the streaming experience. I’ll cover some basics—like what is a codec—and then discuss the benefits of audio encoding, the pros and cons of the most common codec formats, and how to make sure your audio encoding complements your video encoding.
Engineer at an Audio Encoding Workstation_Image

What Is a Codec?

The term codec is a combination of the words coder and decoder. A codec is a standard for encoding and decoding multimedia files to represent data in a specific format.
Codec types inside a video clip
The first thing a codec does is encode a video or audio file. For lossy codecs, this involves dropping “extra” information from raw or uncompressed audio files in order to reduce file size while maintaining as much quality as possible. This process involves a sequence of complex mathematical functions.
The second role of a codec is to decode, which is essentially playing back a video or audio file that’s been encoded. Think of it as reversing the math from the encoding step.
In short, an audio codec is a protocol for compressing digital audio to save space during transmission and then decoding for playback with the video.

Advantages of Audio Encoding

If your application delivers audio or video (or even still images), knowing your options for encoding is useful. For instance, if you know the specs of different audio/video codecs and the best use cases for each, you might be able to improve the experience of users with a bad internet connection.
Here are just a handful of advantages of proper audio encoding:

Less storage space is needed: Encoded data files are smaller, so you should be able to save space on your storage. This is ideal if you have large amounts of data that need archived.
Data is sent over the network faster: Encoding removes redundancies from data, so again, the size of your files is a lot smaller. This results in faster input, even with bad internet connections.
Encoded files consume fewer resources: They reduce the resources required from your machine, like the amount of RAM and processing power when you’re listening to audio files.
Adaptable: Different codec formats are useful for different kinds of projects. For example, the AAC codec can use different frequency ranges with the help of joint encoding to achieve higher quality, a smaller file size, or, the best scenario, both. More advanced audiophiles will notice and appreciate these changes while playing your audio.

Encoding your audio files is a critical part of your video encoding workflow, but just as there are many types of codecs for video, audio has a number of options you can use.

Common Audio Codecs

One important thing to keep in mind when selecting a codec is the devices and services it supports. Some streaming services support a single audio codec, but not another. Some offer better quality, and others focus mainly on compression. Remember, you need a balance between quality and support.
With this in mind, let’s explore some of the most common and best-supported audio codecs.

MP3

MP3 stands for MPEG-2 Audio Layer 3. The most common and well-known audio format, MP3 revolutionized digital audio. Its files were much smaller than previous formats, allowing them to be streamed and downloaded over the internet.
MP3 is a well-supported codec—you can run MP3 files on almost any online or desktop media player, like QuickTime, VLC media player, and Kodi.

AAC

AAC stands for Advanced Audio Coding. Developed a few years after MP3, AAC built on the success of that format but increased compression efficiency. Like most of the more popular codecs, AAC is lossy, but it provides very good audio quality in limited bandwidth, especially when compared to MP3.
It’s a closed-source format but is probably the most widely used audio codec on the internet today. It’s supported by most video-streaming platforms.

AIFF

AIFF stands for Audio Interchange File Format and was developed by Apple. AIFF files are very large, around 10 MB for one minute of standard audio recording.
Most AIFF files contain uncompressed audio in PCM (pulse-code modulation) format. The AIFF file is just a wrapper for the PCM encoding, making it more suitable for use on Mac systems. However, Windows can usually open AIFF files without any issues.

FLAC

FLAC stands for Free Lossless Audio Codec. A bit on the nose maybe, but it has quickly become one of the most popular lossless formats available since its introduction in 2001. Note that with lossless codecs, all the information is retained when the file is compressed.
FLAC can compress audio files without losing a significant amount of data. What’s even nicer is that it’s an open-source and royalty-free audio file format.
Most major services and common devices support FLAC, and it’s the main alternative to MP3 for music. You basically get the full quality of raw uncompressed audio at half the file size. The problem with it is the files are still rather large. If you want to save space, this is not the better option.

Ogg (Vorbis)

Ogg isn’t a fancy acronym; it’s just a container format for one or more codecs. Vorbis is a free open-source lossy format often used with Ogg containers and was created specifically to provide that balance between high quality and efficient streaming. It performs significantly better than most other lossy compression formats (meaning it produces a smaller file size for equivalent audio quality).
Since Vorbis is free, it’s been utilized in a number of both commercial and noncommercial media players, including Spotify.

Opus

Much like its predecessor in Vorbis, Opus is not an acronym and is also a free open-source lossy format that was developed by the same creator as Vorbis, Christopher Montegomery (and Xiph.org). Opus is much more ambitious in its scope than Vorbis, as it supports every kind of audio file available (including music, speech, and real-time voice communication). It’s contained by all major audio containers: Ogg, Matroska, WebM, MPEG-TS.
Opus does just about everything when it comes to audio compression, the caveat of Opus is its complexity and CPU requirements, which have limited its current implementations. Despite that, Opus has become very rapidly and widely adopted by most mainstream OS’s, such as WhatsApp, Android, iOS, Windows, and Playstation.

The Best Audio Codec

Of the commonly used codecs listed here, AAC is the best audio codec for most situations. It’s supported by a wide range of devices and streaming services and has the advantage of better audio quality as compared to MP3.
This may change very soon as Opus becomes more broadly popular. However, hardware doesn’t change as quickly as software, so broad device support is probably still a few years away. For internet video, AAC is currently the best audio codec for live-streaming, as well as video on demand.

Other Considerations for Quality Audio Encoding

Of course, audio encoding is more than just finding the right codec. To get a more complete picture and truly appreciate why audio encoding is just as important as video encoding, let’s consider a few more areas where we can ensure quality audio encoding.

Sample Rates

The sample rate indicates how often an audio clip is recorded per second. Sampling frequencies are measured in hertz (Hz) or kilohertz (kHz)—44,100 samples per second can be expressed as 44,100 Hz or 44.1 kHz.
For digital audio recordings, the sample rate is comparable to the frame rate of a video. The more audio data (samples) is collected, the closer the recorded data is to the original audio.

Bit Depth

Bit depth measures how many bits were captured in each sample. So the higher the bit depth, the more accurately the actual analog audio source can be expressed.
The lowest possible bit depth only has two options to measure the accuracy of the sound: 0 for total silence and 1 for total volume. The higher the bit depth, the more accurate the encoded sound. Case in point, a standard 16-bit audio CD offers 216 (or 65,536) values.

Bit Rates

The bit rate is the amount of data being processed within a given period of time. Common measurements for bit rate include kbps (kilobits per second) and mbps (megabits per second). High bit rates don’t necessarily mean high quality on their own; other factors also need to be considered, like internet speed. But apart from that, the higher the bit rate, generally the sharper the streaming experience will be.
Recommended audio bit rate encoding standards for video include:

Constant Bit Rate (CBR): Keeps the bit rate constant throughout playback. CBR usually encodes faster than VBR, but it does take up more space.
Variable Bit Rate (VBR): Different bit rates are used to encode audio in more complex areas that require more data. Despite the coding time and the lack of support from software and hardware, VBR offers a much better quality-to-storage ratio.
Average Bit Rate (ABR): A subset of VBR. The encoder achieves an average bit rate by having blocks of both lower and higher bit rates.

Conclusion

Audio encoding isn’t something to ignore during your video encoding process. Paying attention to the technical aspects of audio encoding and optimizing for your use cases, in particular, can go a long way toward ensuring the overall quality of the video you deliver.
Looking for video encoding software with modern audio codecs built-in? Check out Bitmovin and offload some of the technical overhead for video and audio encoding.

Video technology guides and articles

Back to Basics: Guide to the HTML5 Video Tag
What is a VoD Platform?A comprehensive guide to Video on Demand (VOD)
Video Technology [2022]: Top 5 video technology trends
HEVC vs VP9: Modern codecs comparison
What is the AV1 Codec?
Video Compression: Encoding Definition and Adaptive Bitrate
What is adaptive bitrate streaming
MP4 vs MKV: Battle of the Video Formats
AVOD vs SVOD; the “fall” of SVOD and Rise of AVOD & TVOD (Video Tech Trends)
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
Container Formats: The 4 most common container formats and why they matter to you.
Quality of Experience (QoE) in Video Technology [2022 Guide]