[object Object] Icon

Learn how to create, start, manage and modify Encodings

[object Object] Icon

Learn how to create, start, manage and modify Players

[object Object] Icon

Learn how to create, start, manage and modify Analyticss

Docs Home
User shortcuts for search
Focus by pressing f
Hide results by pressing Esc
Navigate via   keys

Mon Feb 01 2021

Separating and Combining Audio Streams

IntroductionLink Icon

This tutorial illustrates the different ways to handle multi-channel audio, specifically for when you have multiple source files, different audio layouts, and a need to transform your inputs into streamable audio streams.

First, we go through terminology, and some simple use cases, to introduce the main concepts and API resources that are involved in the encoding configuration when it comes to audio.

Then, we look at use cases that involve channel (re-)mixing from an input stream to an output stream.

Following that, we look at how to (re-)map multiple streams between inputs and outputs.

Finally, in the last section, we put it all together and look at how to merge multiple multi-channel streams into a single output stream.

TerminologyLink Icon

But first, it helps to understand what we are talking about and agree on the vocabulary.

  • A channel is the real audio signal, which is usually associated with a speaker in a multi-speaker setup.

  • Multiple channels are often grouped together into streams. For example a 5.1 surround sound audio stream contains 6 channels. We will often talk about channel layout to refer to the number and order of channels in the stream. Sometimes the term track is used as well: a track refers to the logical entity, the stream refers to the track encoded with a specific codec.

  • A Stream is contained in a file, and a file can contain multiple streams. A file can also contain video streams alongside audio streams.

For this tutorial, we will use illustrations such as the following to help depict the concepts:

Audio Mapping and Mixing - Terminology

This represents a single file, with 3 streams.

  • The first stream (stream 0) is a video stream.
  • The next stream (stream 1) is a stereo audio stream with 2 channels: left and right
  • The last stream (stream 2) is a surround audio stream with 6 channels: front left, front right, center, low frequency effects (for a subwoofer), back left and back right

Simple Stream HandlingLink Icon

Use Case 1 - Implicit Handling

In the simplest case, there is no real audio manipulation taking place. You have an input with an audio stream with a particular channel layout, and you want basically the same to be in your output, with the appropriate codec and bitrate. This use case is our baseline, which we’ll need to introduce additional concepts for the next, more complex use cases. Note that we do not currently support a passthrough options in most cases; an input audio stream still needs to be encoded in order to be properly aligned to the encoded video content, especially when creating segmented content.

Let’s take the following example, with a single stereo stream:

Audio Mapping and Mixing - Use Case 1

We will not go into all aspects of the encoding configuration (there are other tutorials on our website that cover that), but will look specifically at the audio aspect of the configuration. For that, you will need a similar set of resources as for other streams:

  • An IngestInputStream, which defines where your source file is located on the Input storage
  • An audio CodecConfiguration for the codec of your choice, configured appropriately
  • An (output) Stream that defines how the configuration is applied to the input stream
  • One or more Muxings to define how the stream is containerised in a file and transferred to the Output

Note: For the purpose of this tutorial and in the example code files associated with it, we will always generate a single MP4 file as output, with all audio streams multiplexed with the video stream. In most situations that involve adaptive bitrate streaming with manifests such as HLS and DASH, each stream will have its own separate muxing.

1IngestInputStream ingestInputStream = new IngestInputStream();
5ingestInputStream = bitmovinApi.encoding.encodings.inputStreams.ingest.create(encoding.getId(), ingestInputStream);
7AacAudioConfiguration config = new AacAudioConfiguration();
8config.setName("AAC 128 kbit/s");
10config = bitmovinApi.encoding.configurations.audio.aac.create(config);
12StreamInput streamInput = new StreamInput();
15Stream stream = new Stream();
18stream = bitmovinApi.encoding.encodings.streams.create(encoding.getId(), stream);
20Mp4Muxing muxing = new Mp4Muxing();
21MuxingStream muxingStream = new MuxingStream();
25// ... then add video stream, define the encoding output, define a filename, etc.
27muxing = bitmovinApi.encoding.encodings.muxings.mp4.create(encoding.getId(), muxing);

As you can see from this snippet of code, there was no need to handle any aspect of the channel mapping between the input and output for this simple use case. StreamSelectionMode.AUTO (which is the default mode so does not even need to be specified) tells the encoder to do its best at finding an audio stream in the source that makes sense to use as input stream.

Note: if you have previously configured encodings without an IngestInputStream and instead referred to the input file directly in the Stream's creation payload, read this FAQ to understand why we recommend that you switch to the first method.*

A full code sample can be found at AudioChannelManipulation_1_Baseline.java

Use Case 2 - Distinct Input Files

A situation that occurs regularly is one in which each input stream comes in a distinct file. In particular, this is what you will have if you are receiving IMF packages from your content provider, since they store each track in a separate “essence” (hear: file).

Bitmovin also allows you to work with streams that are in separate files, whether or not the files also contain a video stream.

Audio Mapping and Mixing - Use Case 2

This is handled in much the same way as in the previous example, but now you need to have multiple IngestInputStreams, CodecConfigurations and Streams. If we refactor the code to have functions for creation of each of the resources above, the code might therefore look like the following:

1AacAudioConfiguration aacConfig = createAacStereoAudioConfig();
2Ac3AudioConfiguration ac3Config = createAc3SurroundAudioConfig();
4IngestInputStream stereoIngestInputStream = createIngestInputStream(encoding, input, "source_audio_stereo.xmf");
5IngestInputStream surroundIngestInputStream = createIngestInputStream(encoding, input, "source_audio_surround.xmf");
7Stream audioStream1 = createStream(encoding, stereoIngestInputStream, aacConfig);
8Stream audioStream2 = createStream(encoding, surroundIngestInputStream, ac3Config);
10// ... do the same for the video stream ...
11createMp4Muxing(encoding, output, Arrays.asList(videoStream, audioStream1, audioStream2));

A full code sample can be found at AudioChannelManipulation_2_MultipleInputFiles.java

Channel MixingLink Icon

In this section, we look at use cases that require manipulation of the audio channels present in the input audio stream.

Use Case 3 - Swapping Channels

As before, let’s imagine that your input file has a stereo audio stream, but that somehow, the left and right channels have been reversed. It’s more of theoretical use case (or you should have a serious word with your content provider), but it helps us illustrate in a simple way the next concept: Audio Mixing

Audio Mapping and Mixing - Use Case 3

With Audio Mixing, you can go down to the level of the channel (instead of simply the stream), and manipulate the channel layout.

Just as before, you will need an IngestInputStream to point to the source file. But in addition, you now also need to involve an AudioMixInputStream to apply a transformation to that input stream, before generating your output Stream.

The transformation here is simple: take each channel (by position in the input stream) and re-map it to the opposite output channel.

1// create the IngestInputStream
2IngestInputStream audioIngestInputStream = createIngestInputStream(encoding, input, "source.mp4");
4// define the source channels
5AudioMixInputStreamSourceChannel sourceChannel0 = new AudioMixInputStreamSourceChannel();
9AudioMixInputStreamSourceChannel sourceChannel1 = new AudioMixInputStreamSourceChannel();
13// define the mapping to output channels
14AudioMixInputStreamChannel outputChannel0 = new AudioMixInputStreamChannel();
20AudioMixInputStreamChannel outputChannel1 = new AudioMixInputStreamChannel();
26// add it all to an AudioMixInputStream and define the output channel layout
27AudioMixInputStream audioMixInputStream = new AudioMixInputStream();
28audioMixInputStream.setName("Swapping channels 0 and 1");
30audioMixInputStream.setAudioMixChannels(Arrays.asList(outputChannel0, outputChannel1));
31audioMixInputStream = bitmovinApi.encoding.encodings.inputStreams.audioMix.create(
32 encoding.getId(), audioMixInputStream);
34// Configure an audio codec
35AacAudioConfiguration aacConfig = createAacStereoAudioConfig();
37// And finally, create an output Stream with the AudioMaxInputStream and the codec
38Stream stream = new Stream();
39StreamInput streamInput = new StreamInput();
43stream = bitmovinApi.encoding.encodings.streams.create(encoding.getId(), stream);

Let’s highlight a couple of important points from that snippet of code:

  • where before we created the Stream from an IngestInputStream, it is now from the AudioMixInputStream that we do so, since it represents the result of the transformation of the input stream (line 39).
  • the same audio IngestInputStream is used in the definition of both output channels in the configuration above (lines 17 and 23). No need to create separate ones since they're in the same source file.

You can find a full code sample in AudioChannelManipulation_3_ChannelSwapping.java

Another similar use case is if one of your logical source channels is hard “panned” to either the left or right channel of an input stereo pair, leaving the other channel empty. You can then simply select it and copy it to both output channels to center it.

Use Case 4 - Downmixing 5.1 to 2.0

Let’s look now at a more complex use case, which involves multiple source channels being combined into the same output channels.

Audio Mapping and Mixing - Use Case 4

In this example, we have a source file with a 5.1 surround audio stream, which we want to convert into a stereo-only stream in the output. We have decided that the output channels should be mixing the corresponding front channel, the center channel at lower volume, and the corresponding back channel at even lower volume.

Caveat: Note that we are not claiming that this is the right or only way of downmixing 5.1 to 2.0. You will need to discuss the correct mechanism with your content provider. The example above is only for the purpose of illustrating the advanced concepts of audio mixing.

Enter the concept of gain. This property allows us to define at what level (volume) one of the source channels should be combined with others.

Code-wise, this looks fairly similar to the previous use case, but (naturally) with more configuration. Let’s look at just the left channel’s mixing definition:

1// define the source channels
2AudioMixInputStreamSourceChannel sourceChannelL = new AudioMixInputStreamSourceChannel();
6AudioMixInputStreamSourceChannel sourceChannelC = new AudioMixInputStreamSourceChannel();
10AudioMixInputStreamSourceChannel sourceChannelLs = new AudioMixInputStreamSourceChannel();
14// define the mapping to output channels
15AudioMixInputStreamChannel outputChannelL = new AudioMixInputStreamChannel();
19 Arrays.asList(sourceChannelL, sourceChannelC, sourceChannelLs)

Notice also another difference with the previous use case: instead of defining source channels by their position in the stream, we choose them by channel type. If your input file has been correctly tagged, this simplifies the code and also ensures that you can handle input files that may have slightly different channel layouts (since there are multiple ways of laying out 5.1 channels in a stream in the industry)

You can find a full code sample in AudioChannelManipulation_4_Downmixing.java, which streamlines the example above by using functions and helper classes to improve readability.

What about the AudioMix Filter?

You may have seen in our API documentation that we also support a filter for audio mixing. It is functionally equivalent to the AudioMixInputStream when it comes to audio mixing configurations and use cases and was our initial approach to provide this feature. AudioMixInputStreams are the successor of it, so it is highly recommended to use these instead of the AudiomixFilter going forward. In particular, if you are also intending to use other functionality enabled through InputStreams resources, such as trimming and concatenation, you will only be able to define the audio manipulation through an AudioMixInputStream in the chain of InputStreams.

Stream MappingLink Icon

In the previous section, we looked at how to (re-)mix audio channels in the audio input stream. In this third part, we now look at use cases that have a different number of audio streams between the input and output, and where multiple audio streams need to be combined with each other to generate an output stream.

Use Case 5 - Mono Input Tracks

It is a very frequent use case, particular in broadcast workflows. The source file has multiple (often PCM) streams/tracks, each with a single mono channel that represents one of the output channels. Let’s take a middle-of-the-road example here, with 8 mono streams that contain the channels for a stereo pair and a surround audio layout.

Audio Mapping and Mixing - Use Case 5

We have so far met all the resources and concepts necessary to configure an encoding for this use case, but this example allows us to revisit a couple of points highlighted earlier in the tutorial.

In use cases 3 and 4, we had a single IngestInputStream to select the single audio stream from the stream, which could be used in all aspects of the configuration thereafter. As shown in use case 1, we could also use the automatic StreamSelectionMode to let the encoder implicitly select that audio input stream in the source file.

We now will have to be more explicit, and select exactly the right audio stream as input stream for our configuration, and map it appropriately in the AudioMixInputStream configuration to the relevant output channel.

Let’s look at the code for the mapping of the first two channels into the stereo pair

1// create distinct IngestInputStream for each input stream
2IngestInputStream track0IngestInputStream = new IngestInputStream();
7track0IngestInputStream = bitmovinApi.encoding.encodings.inputStreams.ingest.create(encoding.getId(), track0IngestInputStream);
9IngestInputStream track1IngestInputStream = new IngestInputStream();
14track1IngestInputStream = bitmovinApi.encoding.encodings.inputStreams.ingest.create(encoding.getId(), track1IngestInputStream);
16// define how to select the single channel from a mono track
17AudioMixInputStreamSourceChannel sourceChannelMono = new AudioMixInputStreamSourceChannel();
21// define output channels and mapping from source channels
22AudioMixInputStreamChannel outputChannelL = new AudioMixInputStreamChannel();
27AudioMixInputStreamChannel outputChannelR = new AudioMixInputStreamChannel();
32// define and create the audio mix
33AudioMixInputStream stereoMixInputStream = new AudioMixInputStream();
35stereoMixInputStream.setAudioMixChannels(Arrays.asList(outputChannelL, outputChannelR))
37stereoMixInputStream = bitmovinApi.encoding.encodings.inputStreams.audioMix.create(
38 encoding.getId(), stereoMixInputStream);

In this code sample, notice how each AudioMixInputStreamChannel uses a different IngestInputStream to grab a specific audio stream in the input file. As for each IngestInputStream, it selects the exact source stream by its position in the input file, in relative order between all audio streams. You could also use StreamSelectionMode.POSITION_ABSOLUTE if you prefer.

Tip: to determine how many streams your input file has, in what order, and with what audio layout, use mediainfo or ffprobe*

You can find a full code sample in AudioChannelManipulation_5_MultipleInputMonoTracks.java, which streamlines the example above by using functions and helper classes to improve code readability.

Use Case 6 - Separate Input Files for different Channels

Although this use case is not frequent, it is just a special case of the previous example, and can be handled in the exact same way. The only difference is that the input file path may now be different for different IngestInputStreams, instead of (or in addition to) the source channel number.

Audio Mapping and Mixing - Use Case 6

With the same mechanism, you can also now hopefully see how you could replace a single channel in a multi-channel stream with one from a different file. We will not even ask why you would want to do such a thing…

Stream MergingLink Icon

In the previous section, we saw how we can map channels from distinct input streams onto output channels. In this fourth, we will see how we can go one step further and use all the concepts seen so far together, and merge multiple input streams into output streams.

Use Case 7 - Voice-Over or Background Music

In this use case, let’s imagine that we have a source file which contains a stereo audio track, but that in a separate file, we have another stereo track that has a voice-over, such as a commentary. Another similar but reverse use case would be if we had a music track to add as background music to a main audio track.

Audio Mapping and Mixing - Use Case 7

The do this, we need one final tool at our disposal: the ability to merge streams with similar audio layouts. This can be achieved very simply by providing a list of InputStreams when creating a Stream.

1Stream stream = new Stream();

To make this example a little more advanced, and to show how all feature can be combined, let’s also reduce the volume on the original track, so that the voice-over can be heard clearly. We cannot do this when we merge the 2 streams, but we can use the audio mixing functionality shown earlier to create a new input stream with lower gain on all channels:

1IngestInputStream videoIngestInputStream = createIngestInputStreamWithPosition(encoding, input, "original.mp4", 0);
2IngestInputStream mainAudioIngestInputStream = createIngestInputStreamWithPosition(encoding, input, "original.mp4", 1);
3IngestInputStream voiceOverIngestInputStream = createIngestInputStreamWithPosition(encoding, input, "voiceover.mp4", 0);
5AudioMixInputStream secondaryAudioMixInputStream = new AudioMixInputStream();
8for (int i=0; i<=1; i++)
10 AudioMixInputStreamSourceChannel sourceChannel = new AudioMixInputStreamSourceChannel();
11 sourceChannel.setType(AudioMixSourceChannelType.CHANNEL_NUMBER);
12 sourceChannel.setChannelNumber(i);
13 sourceChannel.setGain(0.5);
15 AudioMixInputStreamChannel inputStreamChannel = new AudioMixInputStreamChannel();
16 inputStreamChannel.setOutputChannelType(AudioMixChannelType.CHANNEL_NUMBER);
17 inputStreamChannel.setOutputChannelNumber(i);
18 inputStreamChannel.setInputStreamId(mainAudioIngestInputStream.getId());
19 inputStreamChannel.addSourceChannelsItem(sourceChannel);
21 secondaryAudioMixInputStream.addAudioMixChannelsItem(inputStreamChannel);
24secondaryAudioMixInputStream = bitmovinApi.encoding.encodings.inputStreams.audioMix.create(
25 encoding.getId(), secondaryAudioMixInputStream);
27Stream videoStream = createStream(encoding, Collections.singletonList(videoIngestInputStream), h264Config);
28Stream audioStream = createStream(encoding, Arrays.asList(voiceOverIngestInputStream, secondaryAudioMixInputStream), aacConfig);

You can find a full code sample in AudioChannelManipulation_6_MergingMultipleStreams.java, which shows a slightly different use case, in which the same source file contains 2 stereo streams, instead of them coming from different files.

SummaryLink Icon

To conclude, let’s summarise the concepts involved and how they translate into the Bitmovin API:

  • Use an IngestInputStream (endpoint) to define where your input file is

    • and set an explicit StreamSelectionMode if you need to select a specific stream from that file
    • use multiple IngestInputStreams if you have multiple streams that contain your source channels
  • Add an AudioMixInputStream (endpoint) if you need to mix and map different input channels into output channels

    • and set a gain on individual source channels if you want to alter their volume
  • Merge multiple input streams with the same audio layout by adding them to the output Stream (endpoint)

Tip: Remember also that if you want your encoding workflow to be able to cater for inputs with different audio layouts (for example if you receive some files with 1 stream with 6 channels, and some with 6 mono streams), you could use Stream Conditions to apply the correct logic

Give us feedback