Sometimes, you need to be able to manipulate multiple files or parts of files as input for an encoding job. For simple use cases, you might need to add a bumper highlighting your brand at the start of your assets. In more complex post-production scenarios, you may want to compose a final cut of your asset on the basis of a "recipe" of instructions (for example an EDL (edit decision list) or a CPL (composition playlist).
Since version 2.6.0, the Bitmovin encoding solution supports many such use cases, through a set of new APIs. There are a few principles to understand, which we will go through first, before going through a practical example.
If you have used the Bitmovin encoding solution before, you should be familiar with the simple mechanism that let the encoder know where to find the source file: all you had to do was to add configuration when creating a
Stream object, pointing to an
Input object's identifier representing the input storage (eg. an S3 bucket) as well as a path and input stream selection criteria. You can see this in many of the examples bundled with our SDKs.
With now multiple input files to consider and more flexibility required, there is a need to introduce a new concept and a new dedicated object: the Input Stream.
There are currently 3 types of Input Streams:
- The Ingest Input Stream: it is used to refer to an Input object and a path, in a way comparable to before.
- The Trimming Input Stream: this object defines how an input stream is to be trimmed. There are currently 3 ways to do so: based on time, on a timecode track, or based on the H264 picture timing
- The Concatenation Input Stream: used (unsurprisingly) to concatenate multiple input streams.
So, how do we put it all together in our encoding configuration? The rest of this tutorial will explain it in detail, but let's first create ourselves a realistic use case.
Let's assume that we are a content producer, and that for every asset that we distribute to content partners, we want to add a short bumper at the start of the file, which highlights our brand. In addition, somewhere in the middle of the asset and at the end of it, we want to also add some cross-promotional content. Often, our main content also needs to be topped and tailed to remove unnecessary editorial markers.
Let's represent this graphically:
With this goal clear in our mind, all we need to do now if to configure our encoding. A few lines create the essential objects we need.
1Encoding encoding = new Encoding();2encoding.setName("Concatenated and trimmed encoding with bumper and promo");3encoding.setCloudRegion(CloudRegion.AUTO);4encoding.setEncoderVersion("STABLE");5encoding = encodingApi.encodings.create(encoding);
In this example, let's just assume that all files are available through HTTPS URLs from the same server. We therefore first create an
1HttpsInput input = new HttpsInput();2input.setHost(HTTPS_INPUT_HOST);3input = encodingApi.inputs.https.create(input);
Since there are three files that we need to handle, we first create ingest input streams for them. It doesn't matter that the promo and main asset appear twice in the making of the final asset, we only need to register them once. In each case, we just let the encoder automatically select the audio and video streams from the input file.
1IngestInputStream main = new IngestInputStream();2main.setInputId(input.getId());3main.setInputPath(HTTPS_MAIN_INPUT_PATH);4main.setSelectionMode(StreamSelectionMode.AUTO);5main = encodingApi.encodings.inputStreams.ingest.create(encoding.getId(), main);67IngestInputStream bumper = new IngestInputStream();8bumper.setInputId(input.getId());9bumper.setInputPath(HTTPS_BUMPER_INPUT_PATH);10bumper.setSelectionMode(StreamSelectionMode.AUTO);11bumper = encodingApi.encodings.inputStreams.ingest.create(encoding.getId(), bumper);1213IngestInputStream promo = new IngestInputStream();14promo.setInputId(input.getId());15promo.setInputPath(HTTPS_PROMO_INPUT_PATH);16promo.setSelectionMode(StreamSelectionMode.AUTO);17promo = encodingApi.encodings.inputStreams.ingest.create(encoding.getId(), promo);
Next, we need to let the encoder know how the main asset is to be trimmed. This time, we need two separate objects to represent the 2 parts of the asset that will make it to the final cut. To do this, we create time-based trimming input streams, and link them to the previously created ingest input stream.
Our main asset has 10 seconds at the start that we don't need. Part 1 is 90 seconds, and part 2 is 60 seconds, starting directly after part 1. Translated into the appropriate
duration parameters, this means:
1TimeBasedTrimmingInputStream mainPart1 = new TimeBasedTrimmingInputStream();2mainPart1.setInputStreamId(main.getId());3mainPart1.setOffset(10D);4mainPart1.setDuration(90D);5mainPart1 = encodingApi.encodings.inputStreams.trimming.timeBased.create(encoding.getId(), mainPart1);67TimeBasedTrimmingInputStream mainPart2 = new TimeBasedTrimmingInputStream();8mainPart2.setInputStreamId(main.getId());9mainPart2.setOffset(100D);10mainPart2.setDuration(60D);11mainPart2 = encodingApi.encodings.inputStreams.trimming.timeBased.create(encoding.getId(), mainPart2);
We are now ready to put it all together into one concatenation input stream object:
1ConcatenationInputStream allTogether = new ConcatenationInputStream();23ConcatenationInputConfiguration bumperConfig = new ConcatenationInputConfiguration();4bumperConfig.setInputStreamId(bumper.getId());5bumperConfig.setPosition(0);6bumperConfig.setIsMain(false);7allTogether.addConcatenationItem(bumperConfig);89ConcatenationInputConfiguration part1Config = new ConcatenationInputConfiguration();10part1Config.setInputStreamId(mainPart1.getId());11part1Config.setPosition(1);12part1Config.setIsMain(true);13allTogether.addConcatenationItem(part1Config);1415ConcatenationInputConfiguration promo1Config = new ConcatenationInputConfiguration();16promo1Config.setInputStreamId(promo.getId());17promo1Config.setPosition(2);18promo1Config.setIsMain(false);19allTogether.addConcatenationItem(promo1Config);2021ConcatenationInputConfiguration part2Config = new ConcatenationInputConfiguration();22part2Config.setInputStreamId(mainPart2.getId());23part2Config.setPosition(3);24part2Config.setIsMain(false);25allTogether.addConcatenationItem(part2Config);2627ConcatenationInputConfiguration promo2Config = new ConcatenationInputConfiguration();28promo2Config.setInputStreamId(promo.getId());29promo2Config.setPosition(4);30promo2Config.setIsMain(false);31allTogether.addConcatenationItem(promo2Config);3233allTogether = encodingApi.encodings.inputStreams.concatenation.create(encoding.getId(), allTogether);
position property for each of the items defines the order in which they will appear in the final output. Note how the item for Part 1 is defined as being the
main item. Only one of the input streams can be set in this way, and it is used as reference for scaling, aspect ratio, frame rate, sample rate, etc.
A summary at this stage would be useful, for which a simple diagram should hopefully be sufficient
It's now all in place to finalise our encoding.
From this point on, the rest of the encoding configuration works in the usual way: define your codec configurations, streams, muxings and start your encoding.
There is just one last thing of interest to highlight in those steps. When you define your input streams, instead of providing an input ID and file path as you'd have done before, you now just need to provide the identifier of the input stream:
1StreamInput streamInput = new StreamInput();2StreamInput.setInputStreamId(allTogether.getId());34Stream audioStream = new Stream();5audioStream.addInputStreamsItem(streamInput);6audioStream.setCodecConfigId(aacConfiguration.getId());7audioStream = encodingApi.encodings.streams.create(encoding.getId(), audioStream);89Stream videoStream = new Stream();10videoStream.addInputStreamsItem(inputStream);11videoStream.setCodecConfigId(videoConfiguration1080p.getId());12videoStream = encodingApi.encodings.streams.create(encoding.getId(), videoStream);
It is worth bearing in mind some of the current limitations of the trimming and concatenation workflow. At the time of writing this tutorial, they are as follows:
- Video streams in all input files must have the same resolution and frame rate
- Audio streams in all input files must have the same sample rate, and the same audio layout
Future versions of the encoder will remove some of these restrictions.
In a second part (coming soon), we will build from the example above and look at how the concatenation and trimming feature works in more complex use cases, such as multiple audio tracks and filters.