For a truly submersive video experience, audio is a key component. Here is a look at both the challenges and the available solutions for 360 video.
This October San Francisco hosted Demuxed, “The Conference for Video Engineers”, where I took the liberty to talk about one of my favourite subjects – audio. Audio is an extremely important part of the whole video experience. I think it is safe to say that most videos would be far less enjoyable or not make sense at all without audio, or even with poor quality audio. In researching the subject beyond my own curiosity, I confirmed a suspicion that there is a general lack of focus on audio throughout the industry. This has lead to an imbalance in product features, such as ultra high definition TVs with all kinds of fancy video features but sub-par, smartphone-like audio reproduction quality. More generally it has led to a situation where the state-of-the-art in audio always lags behind video. Keep in mind, I’m not talking about the audiophile community which deliberately puts their focus on audio, but about companies delivering main stream products to average consumers who buy off-the-shelf electronics and just want their stuff to work.
Of course, innovation has to start somewhere, and in the case of 360°/VR video, it clearly also started with video. There’s many vendors of 360° cameras today and there are tons of VR videos available online, but most of them carry only a “traditional” audio track, usually in stereo. The special concern for immersive 360°/VR video is that the perceived sound field does not turn with the viewport, i.e. the direction you’re looking at within the virtual space, regardless of the number of channels – mono, stereo, or even traditional surround sound. What I mean is – if you’re in the middle of the crowd at a concert and turn your head around to look back, with the “traditional” audio you will still hear the band playing from the front. This leads to a discrepancy between the visually and aurally perceived environment. The solution to this “fixed audio” is generally called spherical audio, and many leaders in VR video like YouTube and Facebook, and now also Bitmovin, already have support for this kind of audio experience in the form of the Ambisonic audio technique to store and transport the audio information and the Head-related Transfer Functions (HRTFs) for binaural playback that reproduces the directional clues over headphones.
Ambisonics are not new, but chances are most people have never heard of it. It was invented in the 70’s but never took off for various reasons, one of which, possibly, because it’s British. It’s based on a rather complicated mathematical model that you can meditate over in the Wikipedia article, or get a high level overview of Ambisonics and binaural playback by watching the recording of my Demuxed talk below.
Integration into the Bitmovin Player
Spherical audio for VR videos with Ambisonics and binaural playback in the Bitmovin Player is available today. We currently support first-order ambisonic audio, which is an audio track with 4 channels.
Developers interested in including this capability in their products, can take a look at our open-sourced integration on GitHub. The integration takes advantage of the Web Audio API and consists of loading a small script to the player and instancing it; it’s two lines of code to add to your existing solution. A detailed description is available in the linked repository, along with a demo page. I explain in more detail how this is implemented in my talk, above.
For now this remains a secondary add-on that still remains to be refined, but contributions are, as with all our open-source projects, very welcome. Like with many of our products, it’s ultimately our customers who get to decide which features we are going to develop further. So if you’re interested in spherical audio playback with your VR videos, get in touch with our team!
About Demuxed 2017
Demuxed 2017 was a really nice conference and I can only recommend everybody in video development to attend next year’s. Cool venue, nice people, perfect organization, and a great team behind it. As someone who doesn’t particularly enjoy being on stage, I’d say this was as good as it can get, and everybody who’s thinking about giving a talk there should give it a try. Many people also came to me afterwards or even contacted me online, telling me they enjoyed the talk, so thanks again to everybody! Of course there were lots of interesting talks, covering a broad spectrum of topics with a little weight on AV1 and HDR, which I encourage you to also take a look at. And make sure to check out Bitmovin Reinhard Grandl’s talk on Content-Based Bitrate Adaptation.