AI Contextual Advertising: New Era of Viewer-Centric Ads

TL;DR

AI contextual advertising analyzes video content in real time to understand scenes, objects, sentiment, and themes, enabling ad placements that align with what viewers are watching
Viewer-centric ads improve relevance without relying on third-party cookies, making contextual AI a privacy-compliant alternative to behavioral targeting
AI-driven contextual targeting enhances monetization opportunities by delivering ads that resonate with content context, leading to higher completion rates and stronger advertiser ROI
Integrating AI into streaming workflows creates a scalable framework for content analysis, ad decisioning, and improved user experience across OTT and CTV environments

Video has become the dominant medium for entertainment and information so effective advertising strategies have never been more important. This is exacerbated by Google’s plan to phase out third-party cookies for Chrome which is now expected to happen in early 2025. For over 20 years cookies have been the backbone for targeted advertising campaigns and their potential removal requires new solutions that can deliver relevant ads while respecting user privacy.

Bitmovin’s AI Contextual Advertising addresses this problem. By analyzing the video, audio, and text data during encoding and passing this information to the video player, this technology enables ad servers to deliver highly relevant and targeted ads based on what’s being watched. Unlike traditional cookie-based methods, it integrates seamlessly into the viewing experience without relying on personal user data so it’s both privacy-friendly and highly effective. While advertising is the primary use case we’ll explore, this technology has the potential for a wide range of applications, which we’ll touch on later.

In this blog, we’ll dive into the technical implementation of AI Contextual Advertising, from its origins in a Bitmovin hackathon to the real-world engineering challenges we overcame to bring this latest feature to life.

How It Started

The concept originated during a brainstorming session at a Bitmovin hackathon. The goal of it was to see what we could come up with by utilizing Bitmovin’s multiple solutions. Among many ideas, one stood out: analyzing video content and serving ads relevant to the context of the scenes.

Armed with this concept, we quickly sketched a basic workflow, a rough hackathon-style diagram that you see below to guide us.

The idea was straightforward:

Use Bitmovin’s Encoders to analyze video segments
Employ a multi-modal AI system to extract contextual descriptions of each scene
Pass this metadata to the Bitmovin Player for ad integration
The Player then provides this contextual information to the playout software, which works with the ad server to request relevant ads
Finally, the Player integrates and displays these ads seamlessly within the content

With just two days to deliver a working prototype, we were facing an uphill battle as we didn’t have time to include an ad vendor that we could integrate into the system. So to make sure we could cover this, our group decided to build a custom ad server during the hackathon that would generate ad placements on the fly with AI.

Hackathon Implementation

1. Extraction of contextual information

The first step was to extract contextual information from the live or on-demand video content during encoding. To simplify the process within our time constraints, we focused on analyzing the first frame of each video segment using an AI system. While not perfect, this approach was sufficient for demonstrating the concept (and hopefully winning first prize).

Since we were familiar with it, we leveraged ChatGPT as our AI, and to simplify the approach we set it up so that it would analyze the first frame of every video segment. We knew this wouldn’t be perfect, but it would be good enough to demo the concept for the hackathon. The idea was to give each one of the extracted frames to ChatGPT and to ask it to generate a short description of the contents of the frame in the form of 5 keyword-description pairs.

Then to make the data accessible to the Bitmovin Player, We encapsulated the extracted keyword-description pairs into the video segments as metadata using EMSG boxes.

Example of how the injected metadata looked within the video segments

2. Serving Ads in the Player

For the Player portion, we developed a new ContextualAds module, which exposed an API for handling ad opportunities and scheduling ads. When an ad opportunity arose, the player would fetch and preload ads from the advertising server based on the metadata from the video segments.

Code Snippet:

export interface ContextualAdvertisingApi {

readonly opportunities: AdOpportunity[];

schedule(opportunity: AdOpportunity, options?: Partial<ContextualAdOptions>): Promise<ContextualAd>;

setAdShape(shape: AdShape | undefined): void;

}

Example API code for the ContextualAds Module

The system was designed to display ads seamlessly within the video player, ensuring alignment with the content on screen and also show information about keywords once the ad starts.

Example video of how the ads and metadata were shown on live stream

3. Serving Ads with a Custom Ad Server

Given the lack of time to integrate an actual ad vendor, we built a simple Node app to act as our ad server. This was a basic server that had an API to provide ad opportunity keywords, get ad suggestions from ChatGPT, and use suggestions to get an image/images back that would fit the keywords.

Initially, we aimed to generate ad images on the fly using AI as this would help us ensure that the ads we generated would fit perfectly within the scene. A quick and manual test of this approach immediately showed us how amazing the viewing experience could be, as with a system like this one, we would be able to serve ads that fit the content remarkably well and also have them stylized and presented in a way that is very engaging, light-hearted and even entertaining.

Just look at the – quite frankly astonishing – cat food commercial that ChatGPT came up with during our initial tests. Who would not love to buy DEEST’s QUALTING TASTE food for our beloved furry friends?

Example of a generated ad image with ChatGPT

However, when putting this into practice, we ran into issues quickly. The time it took to generate the images was longer than expected and the size set for them is very limited, with none fitting the aspect ratio that we wanted to display (L-shape around the player). The delay alone was a deal-breaker for us, as the Player would only be able to start requesting ads once it has parsed the ads and we wanted to showcase this in a live stream. We needed to use a small buffer to keep the distance to the live edge short, and we’d therefore only have a few seconds between first parsing a video segment and having to show an ad that fit the content being presented.

So we scrapped the idea of generating ad placement images on the fly and went for a more practical solution: searching for ad placements on the internet. Luckily Bing had what we needed – an API to search for images with very specific queries. This allowed us to find images that fit the aspect ratio of our ads and more targeted queries to find the actual ad content.

To use Bing, we needed to simplify the descriptions we had been sharing with ChatGPT so they would be easily digestible by the search engine and yield the results we needed. By leveraging the AI to refine the queries, we were able to successfully use the Bing API to search for images, ultimately getting our hackathon project over the finish line.

Needless to say we, of course, won the first prize and you can now test it out for yourself on the Bitmovin Dashboard.

The Current System Architecture

Architecture diagram of how Bitmovin’s Contextual Ads feature works and is now accessible to test

To sum up how AI Contextual Advertising works within the graph above:

Content Analysis by Encoders
- The Live and VOD Encoders analyze each segment of the video stream by extracting the first frame of every segment.
- The extracted frames are sent to ChatGPT, which generates five keyword/description pairs to summarize the scene.
Metadata Storage
- The contextual keywords are embedded into the video segments as metadata using custom emsg boxes.
- Both the video and metadata are stored on the CDN during the encoding process.
Player Metadata Extraction
- The Bitmovin Player loads the video segments from the CDN and extracts the contextual metadata from the emsg boxes.
- The metadata is exposed through the Ad Opportunity API, which the Player uses to identify potential ad placements.
Ad Request via Ad Server
- The Player sends the contextual metadata to the ad server, which uses ChatGPT to generate five ad placement ideas.
- The ad server then uses Bing’s image search API to fetch images that match the selected ad idea and desired aspect ratio.
Ad Delivery and Playback
- The ad server responds with an ad placement consisting of two images.
- The Player schedules and displays the ads in alignment with the associated content during playback.

Where we can go from here – Future Possibilities

The implementation of AI Contextual Advertising marks a significant advancement in delivering personalized viewer experiences. At the core of this innovation lies AI Scene Analysis, a technology that meticulously examines video content to extract detailed metadata, including mood, setting, objects, and characters like listed above. This rich metadata enables streaming platforms to align advertisements seamlessly with the content, ensuring that ads resonate with viewers on a contextual level.

Beyond enhancing advertising strategies, AI Scene Analysis opens the door to a multitude of future possibilities:

Jumping to specific moments in content using natural language queries, such as: “I want to watch the part where the guy drives a sled down the stairs and crashes through the glass pane.”
Searching large content libraries with intuitive queries like: “Find the movie where Tom Cruise does that one stunt.”
Providing smarter content recommendations by combining contextual metadata with insights into user viewing behavior.
Delivering targeted and personalized content warnings based on sensitive themes and the user’s past viewing habits.
Summarizing missed episodes or key moments, such as condensing a skipped episode of a TV series into a quick recap.
Automatically annotating important moments, such as flagging goals in a soccer match or other points of interest in live or recorded content.
Identifying points of interest without prior knowledge of the content type, such as auto-generating chapter markers in a video.
Recognizing people, brands, or objects in content, and tracking when and where they appear.
Enhancing older content with up-to-date contextual overlays, such as correcting outdated information in documentaries with real-time annotations.
Combining analytics data with contextual metadata to gain deeper insights into viewing trends and audience engagement.

As we continue to explore and harness the capabilities of AI Scene Analysis, the potential to transform viewer engagement and content management becomes increasingly evident. This technology not only enhances current advertising models but also paves the way for innovative applications that prioritize viewer satisfaction and content relevance.

If you want to learn more about the hackathon, check it out here and if you want to try the AI Contextual Advertising for yourself, test our live demo on the Bitmovin Dashboard.

FAQs

What is AI contextual advertising?

AI contextual advertising uses artificial intelligence to analyze video content and matches ads to that context in real time. Instead of targeting users based on personal data, it aligns ads with what viewers are actively watching.

How does AI analyze video content for advertising?

AI models perform scene-level analysis, object detection, and semantic classification to extract contextual signals from video streams. These signals are then used to dynamically match ads that are contextually relevant to the content.

How is contextual advertising different from behavioral advertising?

Behavioral advertising relies on user data, browsing history, and third-party cookies. AI contextual advertising, by contrast, focuses solely on the content being consumed, making it more privacy-friendly and compliant with evolving data regulations.

How does AI contextual targeting support brand safety?

AI can detect sensitive topics, inappropriate content, or brand conflicts within video scenes. This allows advertisers to avoid unsuitable placements and ensures alignment between brand values and content environments.

Is AI contextual advertising privacy-compliant?

Because it does not depend on personal data or tracking cookies, AI contextual advertising supports privacy-first strategies and aligns with modern data protection regulations.

Device testing

Whitepaper

Partnership

Partners

WHITEPAPER

PARTNERS

AI Contextual Advertising: A New Era for Viewer-Centric Ads