Machine Learning-based Object Detection

Christoph Prager
. 2 min read
- Bitmovin

- Bitmovin

Increase click-rates and make your content more attractive to your viewers

Today, the most common way of creating thumbnails and sprites for videos involves selecting frames based on fixed time segments: for example, creating sprites by selecting a frame every 10 seconds or creating a thumbnail by selecting a frame 30 seconds from the start of a video.
This often leads to thumbnails and sprites displaying misleading imagery, or in some cases even showing out-of-focus or black frames. Also, this often produces images that are unrepresentative of the content at hand. For example, a video stacked with images of human interaction might end up with a random building as a thumbnail image. This is an important issue because, by building an emotional connection with the viewers, especially thumbnails displaying humans tend to perform better.
Youtube recently underlined the importance of relevant thumbnails by publishing that 90% of their best performing videos have custom thumbnails.1 Vevo reported a 12% average increase in views for the first 20 days after a thumbnail has been optimized – with one video, “Ghost” by singer Halsey, showing a whopping 4000% increase.2
- Bitmovin

The graph shows the average increase in views through thumbnail optimization across Vevo’s entire library. Source: Vevo –

Tests by Wistia and Vidyard showed that custom thumbnails can, on average, increase your click rate by 25%.3 For a site with around 5 million clicks per day, assuming $8 CPM4, we’re talking about $10k additional ad revenues daily just because of optimized thumbnails.
In short, videos with random thumbnails perform worse than videos with representative thumbnails. This leads to lower CTRs (click-through rates), and in consequence, to decreasing ad revenues for ad-supported video offerings and to higher churn rates for subscription-based services.
Nonetheless, manual thumbnail creation uses up lots of resources, especially when services offer videos at scale, for example through integrating user-generated content into their offering. But also the average publisher uploads around 40 videos daily, which is already a threshold at which manual thumbnail creation becomes quite resource intensive.
The answer? Train an algorithm to select the most suitable thumbnails and sprites.
Machine learning based thumbnail creation can increase the relevance of the thumbnails and sprites, without using any additional manual resources for this task.
This can happen in two ways – either the algorithm selects the most suitable sprites and thumbnails from a video based on a description text. Alternatively, the algorithm can select sprites and thumbnails by looking at the frequency of occurrence of certain objects and faces.

Optimize your thumbnails with Bitmovin’s Encoding API using state-of-the art machine learning algorithms

Get Started

But applying an object detection algorithm to your video workflow is not limited to a thumbnail creation use-case. Information about detected objects and content can be transformed into metadata, which can be provided to advertisers, enabling more targeted advertising, warranting higher CPMs from advertisers.
By providing additional data points, ML learning based indexation of videos also has the potential to help content creators to quantitatively assess the performance of different types of content better. Plus, by applying tags about detected objects to videos, the method can save valuable resources when archiving footage and makes it much easier to find relevant content again after it has been archived.
Additional Readings:

3 Average for Wistia (34%) and Vidyard (15%). Source:;

Christoph Prager

Christoph Prager

Related Posts

- Bitmovin

5 Ways React Native & Flutter Can Simplify Video Streaming Workflows

Join the conversation