VidTech

The Importance of Observability in Live Video Streaming

Paul Macklin
. 15 min read
- Bitmovin

In today’s digital age, live video streaming has become an essential medium for communication, entertainment, and information dissemination. Whether it’s broadcasting live sports, conducting virtual conferences, or streaming a gaming session, the demand for seamless, high-quality live video has never been higher. However, ensuring a smooth streaming experience is no small feat. This is where the importance of observability comes into play. In this blog post, we’ll talk in more detail about two key pillars of system observability, monitoring and alerts; and we’ll also introduce our newest feature the Live Heartbeat.

- Bitmovin

Why Monitoring is Crucial

Monitoring in live video streaming involves continuously checking various parameters to ensure that everything is functioning correctly. This can include checking the video quality, stream latency, buffer health, and server performance. Effective monitoring can help in identifying issues before they impact the viewer’s experience.

  1. Ensuring error-free delivery: Whatever the screen size, device, or location today viewers expect to view video and hear audio in the highest quality possible. Monitoring helps in maintaining the video quality by detecting issues such as bitrate fluctuations, frame drops, and resolution problems. By keeping an eye on these metrics, streamers can take corrective actions to ensure a consistently high-quality viewing experience.
  2. Staying “On Air”: Whether this is a Live event or a Live linear 24/7 service, maintaining the output to ensure the content is available and the audience experience is uninterrupted has been a vital part of video delivery since the very beginning. Broadcasters would go to great lengths to ensure their systems had resilience in place, with backup systems and disaster recovery processes in place to maintain business continuity. All of those backup systems are only effective if monitoring is in place, to ensure that any issues with the current delivery path are identified with corrective action able to be taken automatically or via a human operator as soon as possible.
  3. Buffer management: Buffering is one of the most common issues in live streaming. Effective monitoring can help in managing buffer health, ensuring that the stream is pre-loaded sufficiently to avoid interruptions. By tracking buffer levels, streamers can adjust the streaming settings or improve the content delivery network (CDN) performance.
  4. Technical service performance: The technical performance of any streaming service is critical for delivering live video. Typically a live signal is processed by a host of interconnected products, forming the service. Monitoring each component to ensure proper behaviour within a business tolerance for error, is crucial to be able to effectively carry out root cause analysis and hold suppliers to account with data if they breach their SLAs. Selecting the correct quality and measurement tools for each component is vital. 

The Role of Alerts

While monitoring is essential, it’s impractical for human operators to watch these metrics 24/7. For large service providers monitoring 100s or 1000s of linear channels, monitoring is often restricted to one or two sections of a delivery chain displayed on large video walls or multiviewers, and sometimes only displayed in exception. Even when a single event is being monitored there can be so many components to monitor, relying on an “eyes on glass” approach might not be practical. 

This is where automated alerts come in. Alerts are notifications triggered by specific events or thresholds, enabling rapid response to potential issues.

  1. Proactive issue resolution: Alerts enable proactive issue resolution by notifying operators of potential problems before they escalate. For example, if the stream bitrate drops below a certain threshold, an alert can notify the technical team to investigate and fix the issue before it affects the viewers.
  2. Minimising downtime: Automated alerts can significantly reduce downtime by ensuring that issues are addressed promptly. For instance, if a product or entire service goes down or experiences high load, alerts can notify the support team to take immediate action, ensuring minimal disruption to the live stream, and reducing the meantime to repair.
  3. Improving viewer experience: By addressing issues quickly through alerts, streamers can maintain a high-quality viewing experience. This leads to higher viewer satisfaction and engagement, which is crucial for retaining an audience and building a loyal following.
  4. Resource optimization: Alerts can also help in optimising resources by providing insights into usage patterns and potential bottlenecks. For example, if alerts indicate that a particular server is consistently under high load, it may be time to scale up the infrastructure or redistribute the load more efficiently.

The Live Heartbeat

- Bitmovin

Over the past few months, the engineering team at Bitmovin has been looking at how to improve the observability our Live Encoder product offers, by improving our alert notifications. We wanted to make key improvements to our platform.

  1. Frequency: Often customers need to be aware of issues in a service as soon as they arrive, and typically issues will arrive during a change in state. For Live Encoding, this can happen at packet level on the input, and this would affect the segments written in the output. By offering lower intervals between alerts, we aim to allow customers to get updates at the frequency at which segments are written. 
  2. Scalability: As our customer base increases in both size and account usage, the number of concurrent live encodings reporting alerts and notifications also increases. Because we offer a SaaS platform, where the infrastructure and platform is managed by Bitmovin we initially aggregated our alerts as well. For some alerts, this will still remain true, but for the Live Heartbeat, it will come directly from the Live Encoder improving the confidence users can have in the service, and removing any bottlenecks for scaling. 
  3. Reliability: As mentioned earlier, it only takes one false positive to undermine the trust in any mission-critical system and for alerts that are particularly true. By making the Live Encoder responsible for the Live Heartbeat, it becomes the single source of truth for the health of the product and the section of the live transmission path that Bitmovin provides. 
  4. Flexibility: There are a raft of data points we can report on in a notification from the Live Encoder, and making sure the payload structure is flexible and easy to add to is also essential. If a customer needs to know something about a function the software is performing and we can report it, adding it to the payload should be swift. 

This is an example payload of the first version of the Live Heartbeat, reporting the status of input video and audio streams.

- Bitmovin

Who should operate the observability?

By now, hopefully, the benefits of a good observability system in place are clear. Monitoring component health, response times, and error rates can help in maintaining optimal system performance. There is a monetary benefit as well of course, by identifying and addressing system issues promptly, companies can prevent potential downtime and ensure uninterrupted streaming. 

Just before looking at how to implement, it’s also important to ask – who will this observability system be for? Typically in Broadcast stations producing a few core services, the multiple products responsible for maintaining the station output would be monitored by a Master Control Room (MCR) or Transmission control room (TX), they would be supported by a dedicated team of engineers. Service providers or Telcos might have an enormous number of services to monitor, and only be responsible for a certain section of the transmission path, such a large organisation already will have a dedicated team of staff monitoring multiple services in large Network Operation Centers (NOC). These large rooms resemble air traffic control centres, with video walls surrounding the staff, showing feeds at different points in their signal paths, along with diagnostic information.

woman sits in tv control room, where monitoring and alerts notify her of issues with the video

Some companies staff these control rooms themselves, and simply need the tools in place to perform the job and other companies might be looking for someone to provide this for them. Sometimes they assume that Bitmovin provides this, but we’re a product company not a managed service provider. We do have some partners that offer this however and are always willing to introduce customers to those partners. 

Implementing Effective Monitoring and Alerts

When implementing monitoring and alerts solutions, it’s sometimes useful to start on paper and if you have users gather their requirements and define their user stories. In most cases if something should be monitored and trigger an alert, a solution can be engineered to do this. Consider which links in the chain are critical and can offer tools to aid fault finding, and by alerting give an early warning to aid preventative maintenance. 

Once you have an idea about what needs to be monitored you can consider more of the details such as:

  1. Define key metrics: Identify and define key metrics that are critical for your streaming service. This can include video quality indicators, audio quality indicators, metadata integrity, latency, and server performance metrics.
  2. Set thresholds: Establish appropriate thresholds for these metrics. Thresholds should be set in a way that they trigger alerts for potential issues without causing unnecessary alarms for minor fluctuations. Typically every company will have a level of fault tolerance it is willing to accept, and the lower the tolerance the higher the cost to achieve that Service Level Agreement (SLA).
  3. Use the right tools: Utilise reliable monitoring and alerting tools that can integrate with your streaming infrastructure. There are various tools available that offer real-time monitoring, analytics, and alerting capabilities tailored for live video streaming.
  4. Regularly review and adjust: Regularly review the performance data and adjust thresholds and monitoring strategies as needed. Continuous improvement is key to maintaining an effective monitoring and alert system.

Example Use Cases for Monitoring and Alerts

To better understand the significance of monitoring and alerts in live video streaming, let’s explore some example use cases:

  1. Live sports broadcasting:
  • Scenario: During a live sports event, maintaining high availability and error-free delivery is crucial for an engaging viewer experience.
  • Monitoring: Continuously track the health of main and backup transmission paths, typically demarcation points from key equipment suppliers that are the responsibility of the team monitoring the equipment. Often an “off-air” confidence monitor, showing what the “viewer at home” is seeing. 
  • Alerts: Set up alerts for core supplier demarcation points, increased error rates, or system downgrades to immediately address any issues. Measure system enhancements such as graphics systems separately so they can be bypassed if required.
  1. Virtual conferences and webinars:
  • Scenario: Hosting a virtual conference with multiple speakers and interactive sessions requires smooth transitions and minimal disruptions.
  • Monitoring: Typically far fewer equipment suppliers and components will be involved, so aggregation can be leveraged to streamline the number of monitoring points. Monitor stream health, website load, and participant connectivity.
  • Alerts: Trigger alerts for server overloads, participant dropouts, or stream interruptions to quickly deploy backup resources or troubleshoot connectivity problems.
  1. Gaming streams:
  • Scenario: Streaming a live gaming session where real-time interaction with viewers is key to maintaining engagement.
  • Monitoring: Keep an eye on frame rates, latency, and viewer engagement metrics. Larger events have also become similar to live sporting events, and will have similar requirements to those listed above. 
  • Alerts: Set alerts for frame rate drops, increased latency, or significant drops in viewer engagement, allowing for immediate corrective actions.
  1. News broadcasting:
  • Scenario: Broadcasting live news where timeliness and reliability are critical.
  • Monitoring: Continuously track the health of main and backup transmission paths, typically demarcation points from key equipment suppliers that are the responsibility of the team monitoring the equipment. Often an “off-air” confidence monitor, showing what the “viewer at home” is seeing. Check latency and your rivals – if you’re not first you’re last
  • Alerts: Generate alters similar to live sports events, with additional attention paid to the multiple platforms being delivered to and confidence monitors, typically news needs to be on as many screens as possible.
  1. 24/7 Live Linear Channels
  • Scenario: Broadcasting 24/7 linear channel services that are always serving content to users. 
  • Monitoring: Multiple outputs from key infrastructure components in the chain, typically demarcation points from key equipment suppliers that are the responsibility of the team monitoring the equipment. Often an “off-air” confidence monitor, showing what the “viewer at home” is seeing. 
  • Alerts: Set up alerts for core supplier demarcation points, increased error rates, or system downgrades to immediately address any issues. Every service should have the main (most popular/most viewed) off-air platform monitored for each service. If there is an issue there, you’ll want to be able to resolve and direct engineering support teams as efficiently as possible.

Recommended Monitoring and Alerting Tools

At an extremely simplified and high level, here are some of the demarcation points in a signal chain and segments of similar equipment in a transmission path. For each category we provide a list of products that can be used to implement a robust monitoring and alerting workflow for live video streaming, they are by no means exhaustive or an endorsement of any particular solution:

- Bitmovin

A. Aggregated Mass Notification Systems
These solutions would typically be used as endpoints for pub/sub or push notifications from multiple systems, aggregating alerts from multiple manufacturers to display the health of a single service, or providing a holistic view of a technology platform. Here we have split the tools into two categories, Data and Media, because you would want to aggregate alarms and health monitoring into a single interface, and separately you would also want to see and hear media into a single large display. 

Data

  1. New Relic
    • Features: Offers real-time performance monitoring, error tracking, and alerting for server and application performance.
    • Use Case: Ideal for monitoring server load, response times, and application health during live streaming.
  2. Datadog
    • Features: Provides end-to-end monitoring with detailed analytics, real-time alerts, and integrations with various streaming platforms.
    • Use Case: Suitable for comprehensive monitoring of video quality, latency, and server performance.
  3. DataMiner by Skyline Communications
    • Features: Offers end-to-end monitoring, fault management, and performance analytics specifically designed for media and broadcasting industries.
    • Use Case: Best for comprehensive monitoring of entire broadcast chains, optimising resource management, and ensuring high-quality content delivery.
  4. Prometheus and Grafana
    • Features: Prometheus offers powerful time-series monitoring, and Grafana provides flexible and interactive visualisations.
    • Use Case: Effective for creating customised dashboards to monitor various metrics such as server performance, video bitrate, and latency.
  5. Databricks
    • Features: Offers a data aggregation platform to collect metrics from multiple data sources, across a software stack. Uses AI models to provide insights and elevated reporting. 
    • Use Case: Offering a great overview of entire plant operations, supporting troubleshooting by technical teams, observability for operations and data insights in terms of performance for executive stakeholders. 
  6. Nagios
    • Features: An open-source platform that can be used to build live dashboards monitoring systems with multiple components taking alerts via API calls, SNMP or pub/sub webhooks. Also has a great log collector function for root cause analysis. 
    • Use Case: For anyone looking to invest significant time to build a comprehensive solution, this is a great tool that can be customised and useful for operations and engineering teams. 

Media

  1. Grass Valley
    • Features: Grass Valley Kaleido Multiviewers are configurable multi input software that comes available with a range of input interfaces and models. They can display multiple video, audio and data sources in a single video wall and issue alerts. 
    • Use Case: Suitable for a modular based approach, where future scalability is key. Could monitor signals in each step of the chain.  
  2. Imagine Communications
    • Features: Selenio and Platinum products are ideal for production environments where high bitrate video input sources need to be monitored.  
    • Use Case: Live production studios and control rooms, or playout centres distribution content up to Transmission. 
  3. TAG Video
    • Features: TAG Video is dedicated to building monitoring solutions for multiviewers, monitoring and data analysis products. The platform supports a wide range of input interfaces and models. They can display multiple video, audio and data sources in a single video wall and issue alerts. 
    • Use Case: Suitable for monitoring a holistic overview of each step of the chain.  

B. Acquisition
Products and components at this part of the chain are responsible for capturing the video, audio and data sources. Monitoring tools here for a production workflow would normally be test and measurement devices to ensure the equipment is properly calibrated and that the output from the devices meets certain specifications. Typically this is the most critical part of the chain, where it’s much harder to have back-up devices ready to take over.

  1. Leader
    • Features: Waveform monitors and rasterizers display a range of scopes for measuring uncompressed video signals over SDI or IP. A great range of products from high to mid-end. 
    • Use Case: Measuring signal health, and error rates and also line-up ensuring correct calibration.
  2. Telestream
    • Features: The company’s waveform monitors and rasterizers display a range of scopes for measuring uncompressed video signals over SDI or IP. 
    • Use Case: Measuring signal health, and error rates and also line-up ensuring correct calibration. Useful in the camera control rooms, post production facilities and Quality Control. 
  3. TSL Systems
    • Features: Provides audio metering products to measure level, loudness, signal presence and phasing. 
    • Use Case: Audio monitoring for signal levels and integrity in any customer acquisition environment. 
  4. Leader/PHABRIX
    • Features: Also from Leader, but the popular portable handheld devices are well known as a standalone brand to any engineer working with baseband video. The devices can generate signals and analyse them using a multitude of scopes, in robust cases with a long battery life and high quality screen and simple controls. 
    • Use Case: An essential tool for analysing a host of different components in a chain during installation, routine maintenance or during fault finding. 

C. Processing & Routing

  1. Bridge Technologies
    • Features: Specialists in monitoring probes of signals at different stages of the production train, able to measure uncompressed signals, compressed contribution (Transmission), compressed domain (Distribution) and off-air platforms. 
    • Use Case: Provides signal quality and health in a holistic manner, measuring the muxed video, audio and data streams in a consolidated signal. 
  2. Interra Systems
    • Features: Provide quality control software for measuring the signal quality, in terms of artefacts and content quality (perceptual visual and audio quality). 
    • Use Case: Can be used to measure content according to a set of business rules and allow teams to manage bulk content and alert operators on request. 

D. Transmission

  1. Bridge Technologies
    • Features: Specialists in monitoring probes of signals at different stages of the production train, able to measure uncompressed signals, compressed contribution (Transmission), compressed domain (Distribution) and off-air platforms. 
    • Use Case: Provides signal quality and health in a holistic manner, measuring the muxed video, audio and data streams in a consolidated signal. 
  2. IMAX
    • Features: Using StreamSmart and StreamAware, deploy monitoring solutions such as quality probing software measuring quality from multiple points along a transmission path using SSIM quality metrics. 
    • Use Case: To ensure that a benchmark of audio and video quality is met and maintained throughout the transmission path. 
  3. Interra Systems
    • Features: Provide quality control software for measuring the signal quality, in terms of artefacts and content quality (perceptual visual and audio quality). 
    • Use Case: Can be used to measure content according to a set of business rules and allow teams to manage bulk content and alert operators on request. 

E. Distribution

  1. Hydrolix
    • Features: Offering a data lake platform that can capture vast quantities of logging information across a distribution platform and make that queryable via an indexed search. 
    • Use Case: A perfect tool for teams responsible for monitoring multiple content delivery networks and security platforms.  
  2. PROMAX ELECTRONICS
    • Features: A range of tooling for monitoring MPEG encoders and POPs for distribution of content over DTTV, Satellite and Cable Optical Delivery Networks. 
    • Use Case: Companies managing multiple distribution traditional broadcast network. 
  3. Touchstream
    • Features: Observability tools for monitoring media distribution over CDNs, monitoring performance and health of network distribution. Additionally providing a virtual NOC for monitoring the health of key components in the OTT transmission path from Encoder to OTT devices. 
    • Use Case: Tools are crafted for teams looking for greater observability over OTT distribution paths. 

F. Off-Air Platforms

  1. Bridge Technologies
    • Features: Specialists in monitoring probes of signals at different stages of the production train, able to measure uncompressed signals, compressed contribution (Transmission), compressed domain (Distribution) and off-air platforms. 
    • Use Case: Provides signal quality and health in a holistic manner, measuring the muxed video, audio and data streams in a consolidated signal. 
  2. IMAX
    • Features: Using StreamSmart and StreamAware, deploy monitoring solutions such as quality probing software measuring quality from multiple points along a transmission path using SSIM quality metrics. 
    • Use Case: To ensure that a benchmark of audio and video quality is met and maintained throughout the transmission path. 
  3. Interra Systems
    • Features: Provide quality control software for measuring the signal quality, in terms of artefacts and content quality (perceptual visual and audio quality). 
    • Use Case: Can be used to measure content according to a set of business rules and allow teams to manage bulk content and alert operators on request. 
  4. Bitmovin Analytics
    • Features: Focuses specifically on video streaming with detailed insights into video performance, viewer engagement, and quality of experience.
    • Use Case: Excellent for monitoring video quality metrics, buffer health, and viewer engagement in real time.

G. Managed Service Providers

  1. Stream AMG
    • Features: Leading sports OTT platform provider that allows clubs, leagues, rights holders and more to build online video services to monetize their content.
    • Use Case: Integrate with their “Headless OTT” or use their full end-to-end solution for live video delivery, monetization, engagement, analytics and content protection.
  2. M2A Media
    • Features: Automation and orchestration of AWS Media Services for premier live events. 
    • Use Case: Operations teams can use M2A interfaces to build, monitor and capture live streaming video content running on AWS, without any cloud or dev skills necessary. 
  3. LTN Global
    • Features: High-quality video transport and distribution services. Ultra-low latency video delivery, cloud-based media workflows, live productions tools and comprehensive monitoring. 
    • Use Case: Real-time news coverage and remote guest contributions. Live sports events and cloud-based, remote media production. 
  4. Telstra Broadcast Services
    • Features: Comprehensive media and broadcast solutions provider with global low-latency media network. Specialists in live events and media workflow solutions.
    • Use Case: Live sports broadcasting and remote production; Festival, concert and event streaming. 
  5. Irdeto
    • Features: Managed broadcast and online content distribution infrastructure; Design, build and optimise new video platforms. 
    • Use Case: Video compression and delivery network management, high-profile event management.

Conclusion

In the dynamic world of live video streaming, maintaining a seamless and high-quality viewer experience is paramount. Monitoring and alerts play a crucial role in achieving this by ensuring that potential issues are identified and addressed promptly. By implementing robust monitoring and alerting systems, streamers can enhance their service reliability, optimise resources, and ultimately deliver an outstanding experience to their audience.

Paul Macklin

Paul Macklin

Sr. Product Manager | Live Encoding

Paul Macklin is the Senior Product Manager for the Bitmovin Live Encoder. An experienced product professional and specialist with video and SaaS products. With a background in broadcast engineering, he has worked in solutions architect roles as a customer and has extensive experience in pre-sales roles with suppliers. He has worked with live media content throughout his career. Paul focuses on developing Bitmovin's live streaming capabilities and ensuring better viewer experiences during live playback.


Related Posts

- Bitmovin
VidTech

3-Pass encoding enhances video quality, making every bit count

- Bitmovin
VidTech

“Better Together” at IBC 2024: Elevating Streaming Experiences with Bitmovin Innovators Network

Join the conversation