Innovation Product Updates

AI Contextual Advertising: A New Era for Viewer-Centric Ads

Adis Talic
Wolfram Hofmeister
. 7 min read
AI Contextual Advertising - Bitmovin

AI Contextual Advertising: A New Era for Viewer-Centric Ads

Video has become the dominant medium for entertainment and information so effective advertising strategies have never been more important. This is exacerbated by Google’s plan to phase out third-party cookies for Chrome which is now expected to happen in early 2025. For over 20 years cookies have been the backbone for targeted advertising campaigns and their potential removal requires new solutions that can deliver relevant ads while respecting user privacy.

Bitmovin’s AI Contextual Advertising addresses this problem. By analyzing the video, audio, and text data during encoding and passing this information to the video player, this technology enables ad servers to deliver highly relevant and targeted ads based on what’s being watched. Unlike traditional cookie-based methods, it integrates seamlessly into the viewing experience without relying on personal user data so it’s both privacy-friendly and highly effective. While advertising is the primary use case we’ll explore, this technology has the potential for a wide range of applications, which we’ll touch on later.

In this blog, we’ll dive into the technical implementation of AI Contextual Advertising, from its origins in a Bitmovin hackathon to the real-world engineering challenges we overcame to bring this latest feature to life.

How It Started

The concept originated during a brainstorming session at a Bitmovin hackathon. The goal of it was to see what we could come up with by utilizing Bitmovin’s multiple solutions. Among many ideas, one stood out: analyzing video content and serving ads relevant to the context of the scenes.

Armed with this concept, we quickly sketched a basic workflow, a rough hackathon-style diagram that you see below to guide us. 

AI Contextual Advertising - Bitmovin

The idea was straightforward:

  1. Use Bitmovin’s VOD and Live Encoders to analyze video segments
  2. Employ a multi-modal AI system to extract contextual descriptions of each scene
  3. Pass this metadata to the Bitmovin Player for ad integration
  4. The Player then provides this contextual information to the playout software, which works with the ad server to request relevant ads
  5. Finally, the Player integrates and displays these ads seamlessly within the content

With just two days to deliver a working prototype, we were facing an uphill battle as we didn’t have time to include an ad vendor that we could integrate into the system. So to make sure we could cover this, our group decided to build a custom ad server during the hackathon that would generate ad placements on the fly with AI.

Hackathon Implementation

1. Extraction of contextual information

The first step was to extract contextual information from the live or on-demand video content during encoding. To simplify the process within our time constraints, we focused on analyzing the first frame of each video segment using an AI system. While not perfect, this approach was sufficient for demonstrating the concept (and hopefully winning first prize).

Since we were familiar with it, we leveraged ChatGPT as our AI, and to simplify the approach we set it up so that it would analyze the first frame of every video segment. We knew this wouldn’t be perfect, but it would be good enough to demo the concept for the hackathon. The idea was to give each one of the extracted frames to ChatGPT and to ask it to generate a short description of the contents of the frame in the form of 5 keyword-description pairs. 

Then to make the data accessible to the Bitmovin Player, We encapsulated the extracted keyword-description pairs into the video segments as metadata using EMSG boxes.

AI Contextual Advertising - Bitmovin

 Example of how the injected metadata looked within the video segments

2. Serving Ads in the Player

For the Player portion, we developed a new ContextualAds module, which exposed an API for handling ad opportunities and scheduling ads. When an ad opportunity arose, the player would fetch and preload ads from the advertising server based on the metadata from the video segments.

Code Snippet:

export interface ContextualAdvertisingApi {

  readonly opportunities: AdOpportunity[];

  schedule(opportunity: AdOpportunity, options?: Partial<ContextualAdOptions>): Promise<ContextualAd>;

  setAdShape(shape: AdShape | undefined): void;

}

Example API code for the ContextualAds Module

The system was designed to display ads seamlessly within the video player, ensuring alignment with the content on screen and also show information about keywords once the ad starts. 

  Example video of how the ads and metadata were shown on live stream

3. Serving Ads with a Custom Ad Server

Given the lack of time to integrate an actual ad vendor, we built a simple Node app to act as our ad server. This was a basic server that had an API to provide ad opportunity keywords, get ad suggestions from ChatGPT, and use suggestions to get an image/images back that would fit the keywords.

Initially, we aimed to generate ad images on the fly using AI as this would help us ensure that the ads we generated would fit perfectly within the scene. A quick and manual test of this approach immediately showed us how amazing the viewing experience could be, as with a system like this one, we would be able to serve ads that fit the content remarkably well and also have them stylized and presented in a way that is very engaging, light-hearted and even entertaining.

Just look at the – quite frankly astonishing – cat food commercial that ChatGPT came up with during our initial tests. Who would not love to buy DEEST’s QUALTING TASTE food for our beloved furry friends?

AI Contextual Advertising - Bitmovin

Example of a generated ad image with ChatGPT

However, when putting this into practice, we ran into issues quickly. The time it took to generate the images was longer than expected and the size set for them is very limited, with none fitting the aspect ratio that we wanted to display (L-shape around the player). The delay alone was a deal-breaker for us, as the Player would only be able to start requesting ads once it has parsed the ads and we wanted to showcase this in a live stream. We needed to use a small buffer to keep the distance to the live edge short, and we’d therefore only have a few seconds between first parsing a video segment and having to show an ad that fit the content being presented.

So we scrapped the idea of generating ad placement images on the fly and went for a more practical solution: searching for ad placements on the internet. Luckily Bing had what we needed – an API to search for images with very specific queries. This allowed us to find images that fit the aspect ratio of our ads and more targeted queries to find the actual ad content.

To use Bing, we needed to simplify the descriptions we had been sharing with ChatGPT so they would be easily digestible by the search engine and yield the results we needed. By leveraging the AI to refine the queries, we were able to successfully use the Bing API to search for images, ultimately getting our hackathon project over the finish line.

Needless to say we, of course, won the first prize and you can now test it out for yourself on the Bitmovin Dashboard.

The Current System Architecture

AI Contextual Advertising - Bitmovin

Architecture diagram of how Bitmovin’s Contextual Ads feature works and is now accessible to test

To sum up how AI Contextual Advertising works within the graph above:

  1. Content Analysis by Encoders
    • The Live and VOD Encoders analyze each segment of the video stream by extracting the first frame of every segment.
    • The extracted frames are sent to ChatGPT, which generates five keyword/description pairs to summarize the scene.
  2. Metadata Storage
    • The contextual keywords are embedded into the video segments as metadata using custom emsg boxes.
    • Both the video and metadata are stored on the CDN during the encoding process.
  3. Player Metadata Extraction
    • The Bitmovin Player loads the video segments from the CDN and extracts the contextual metadata from the emsg boxes.
    • The metadata is exposed through the Ad Opportunity API, which the Player uses to identify potential ad placements.
  4. Ad Request via Ad Server
    • The Player sends the contextual metadata to the ad server, which uses ChatGPT to generate five ad placement ideas.
    • The ad server then uses Bing’s image search API to fetch images that match the selected ad idea and desired aspect ratio.
  5. Ad Delivery and Playback
    • The ad server responds with an ad placement consisting of two images.
    • The Player schedules and displays the ads in alignment with the associated content during playback.

Where we can go from hereFuture Possibilities

Our implementation for the Hackathon was basic and simple but it works and is a good starting point for more advanced AI features in the future. The possibilities are endless beyond ad placement. With AI that can extract visual and contextual information from video, audio and text, AI driven content analysis opens up many possibilities such as:

  • Jumping to specific moments in content using natural language queries, such as: “I want to watch the part where the guy drives a sled down the stairs and crashes through the glass pane.”
  • Searching large content libraries with intuitive queries like: “Find the movie where Tom Cruise does that one stunt.”
  • Providing smarter content recommendations by combining contextual metadata with insights into user viewing behavior.
  • Delivering targeted and personalized content warnings based on sensitive themes and the user’s past viewing habits.
  • Summarizing missed episodes or key moments, such as condensing a skipped episode of a TV series into a quick recap.
  • Automatically annotating important moments, such as flagging goals in a soccer match or other points of interest in live or recorded content.
  • Identifying points of interest without prior knowledge of the content type, such as auto-generating chapter markers in a video.
  • Recognizing people, brands, or objects in content, and tracking when and where they appear.
  • Enhancing older content with up-to-date contextual overlays, such as correcting outdated information in documentaries with real-time annotations.
  • Combining analytics data with contextual metadata to gain deeper insights into viewing trends and audience engagement.

If you want to learn more about the hackathon, check it out here and if you want to try the AI Contextual Advertising for yourself, test our live demo on the Bitmovin Dashboard.

Adis Talic

Adis Talic

Senior Software Engineer | Player Web

Adis Talic is a Senior Software Engineer at Bitmovin and is a part of the Web Player engineering team helping expand and maintain support for existing and new devices and Player Features. Outside work, Adis is passionate about working on his personal projects, and learning new programming languages and concepts.

Wolfram Hofmeister

Wolfram Hofmeister

Senior Software Engineer | Player Web

Wolfram is a video streaming enthusiast with many years of experience in the industry and a Masters degree in Applied Informatics, specializing in Distributed Multimedia Systems. While his main expertise is in building video streaming players for the web, he's also passionate about video encoding and VR/AR applications.


Related Posts

Join the conversation