Developers

Encoding VR and 360 Immersive Video for Meta Quest Headsets

Gabriel Dávila Revelo
. 6 min read
- Bitmovin

This article was originally published in April 2023. It was updated Nov 14, 2023 with information about Quest 3 AV1 support.

Whether you’re calling it Virtual Reality (VR) or 360 video or Metaverse content, there are a lot of details that should be taken into consideration in order to guarantee a good immersive experience. Things like video resolution, bitrates and codec settings all need to be set in a way that creates a high quality of experience for the viewers, while being conscious of storage and delivery costs that can come with these huge files. Although all this stuff has been widely discussed for 2D displays, like mobile phones and TVs, VR streaming differs enormously from those traditional screens, using different display technology that drastically shortens the viewing distance from eye to screen. In addition to that, VR headset specs may differ from one device to another, so the same video may produce a different visual experience depending on the model or device. In this post we are going to share the things you need to consider, along with tips and best practices for how to encode great looking VR content, specifically for playback on Meta Quest (formerly known as Oculus) headsets.

Visual quality requirements of 3D-VR vs 2D videos

Unlike traditional 2D screens, where viewers are located at a considerable distance from the screen, VR viewers are looking at a smaller screen much closer to the eyes. This drastically changes the way a video should be encoded in order to guarantee good visual quality for an immersive 3D experience. For this same reason, the traditional 2D video quality metrics such as VMAF and PSNR are not usually useful to measure the visual perception for 3-D VR content, for instance:

VMAF for 3D-VR

VMAF considers 2D viewers located at a viewing distance in the order of magnitude of the screen size, for example:

  • 4K VMAF model – vmaf_4k_v0.6.1,  takes into consideration that the viewer is located at 1.5H from the screen, where H is the TV screen high. 
  • HD VMAF model – vmaf_v0.6.1, considers  a viewer located at 3H from the screen.

The previous models resulted in a pixel density of about 60 pixels per degree (ppd) and 75 ppd – for 4K and HD respectively. However, when talking about VR videos, the pixel density is highly magnified, for instance, for Meta Quest 2 headsets the specs mention a pixel density of 20 ppd. Therefore, the predefined VMAF models are not suitable.  Actually, if you do use VMAF to get the visual quality (VQ) for a VR video intended for headset playback, you’ll probably find it does not look good enough even though it has a high VMAF score – this is because of the “zoom in” that Quest does in comparison to the traditional screens. 

PSNR for 3D-VR

Even when it is not a rule, it is expected to have good VQ on 2D videos when PSNR values are between 39 dB and 42 dB – for average to high complexity videos. See [1] [2] However, this PSNR range is usually not enough to create a good immersive experience with Quest headsets. For instance, according to some empirical tests we did, we found that at least a PSNR above 48 dB is required for good VQ with Quest devices.

- Bitmovin
image source: Meta Quest Blog

The Best Encoding Settings for Meta Quest devices

A general overview of the Video Requirements can be found at the Meta Quest website. Additionally, the following encoding settings may be useful when building your encoding workflow: 

Resolution

The minimal resolution suggested by Meta is 3840 x 3840 px for stereoscopic content and 3840 x 1920 px for monoscopic content, which is much higher than earlier generations or mobile devices.  

H265 Video Codec Settings 

Video Codec – Meta Quest devices support H264(AVC) and H265(HEVC) codecs, however given that they require resolutions above 3840 px, we strongly recommend H265 due to the high encoding efficiency it has when comparing it to H264. 

GOP Length – In our tests we successfully achieved a good VQ within the recommending bitrate range, using a 2-second GOP length for 30 fps content. However, since the VR experience is not as latency sensitive for video on demand, we suggest using greater GOP lengths in order to improve the encoding efficiency even more if needed.

Target bitrate and CRF – Meta suggests a target bitrate between 25-60 Mbps and as mentioned, we strongly suggest using the H265 codec to maintain high visual quality within that range. If the bitrate goes too far above the suggested maximum, customers may experience slow playback or stalling due to device performance issues. 

Having said all that, it is worth mentioning that setting a proper bitrate to meet the VQ expectations is really challenging, mainly because the bitrates necessary may change from one piece of content to another depending on their visual complexity. Because of that, we suggest using a CRF based encoding instead of a fixed bitrate. Specifically, we found that when talking about H265, a CRF between 17-18 would produce videos that are suitable for viewing on Quest headsets without excessively high bitrates. 

Building 360-VR encoding workflows with Bitmovin VOD Encoding

Bitmovin’s VOD Encoding provides a set of highly flexible APIs for creating workflows that fully meet Meta Quest encoding requirements. For instance:

  • If adaptive bitrate streaming is required at the output, Bitmovin Per-Title encoding can be used to automatically create the ABR ladder with the top rendition driven by the desired CRF target.
  • If progressive file output is required, a traditional CRF encoding can be used by capping the bitrates properly.
  • Additionally, Bitmovin filters can be used to create monoscopic content based on a stereoscopic input, for instance, cropping the original/stereoscopic video to convert it from a top-and-bottom or side-by-side array into a single one. Monoscopic outputs can be viewed on 2D displays, extending the reach of your 360 content beyond headsets.

Per-Title Encoding configuration for VR

The following per-title configuration may be used as a reference for encoding a VR content. Depending on the content complexity, the output may include from 4 to 7 renditions with the top rendition targeting a CRF value of 17.

perTitle: {
     h265Configuration: {
       minBitrate: 5000000,
       maxBitrate: 60000000,
       targetQualityCrf: 17,
       minBitrateStepSize: 1.5,
       maxBitrateStepSize: 2,
       codecMinBitrateFactor: 0.6,
       codecMaxBitrateFactor: 1.4,
       codecBufsizeFactor: 2,
       autoRepresentations: {
         adoptConfigurationThreshold: 0,
         },
       },
     }

Theres also full code samples here if you would like to dig deeper.

The same configuration can be used to encode any VR format such as top-and-bottom, side-by-side or monoscopic 360 content. The per-title algorithm will automatically propose a proper bitrate and resolution for each VR format based on the input details. Additionally, it is strongly recommended to use VOD_HIGH_QUALITY as an encoding preset and THREE_PASS as encoding mode. This will assure the Bitmovin Encoder delivers the best possible visual quality. 

In our tests using typical medium-high complexity content, we found that using a CRF of 17 produces good VQ for Meta Quest playback, with PSNR values above 48 dB and bitrates that are usually below the suggested maximum of 60 Mbps. 

Alternatively, traditional CRF encoding can be used instead of Per-title, for instance if only one rendition is desired at the output – with no ABR.

Creating monoscopic outputs from stereoscopic inputs

Usually, VR 360 cameras record the content in stereoscopic format either in top-and-bottom or side-by-side arrangements. However, depending on the customer use case, it could be required to convert the content from stereoscopic to monoscopic formats. This can be easily solved with the Bitmovin VOD Encoding API by applying cropping filters to remove the required pixels or frame percentage from the stereoscopic content, turning it into monoscopic format, i. e., by removing the left/right or the bottom/top side from the input asset.

- Bitmovin
Top-Bottom Stereoscopic Format source: Blender Foundation

For instance, the following javascript snippet would remove the top side of a 3840 x 3840 stereoscopic content:

.....
.....
// Crop filter definition
const cropTopSideFilter = new CropFilter({
  name: "stereo-to-mono-filter-example",
  left: 0,
  right: 0,
  bottom: 0,
  top: 1920,
 })
 
// Crop filter creation 
cropTopSideFilter = await bitmovinApi.encoding.filters.crop.create( cropTopSideFilter)

// Stream Filter definition
const cropTopSideStreamFilter = new StreamFilter({
  id : cropTopSideFilter.id,
  position: 0,
})

// StreamFilter creation
bitmovinApi.encoding.encodings.streams.filters.create(<encoding.id>, <videoStream.id>, [cropTopSideStreamFilter] )

AV1 Codec Support on Meta Quest 3

In the recommended settings above, we strongly suggested using HEVC over H.264 because the newer generation codec offers greater compression efficiency that turns into bandwidth savings and a better quality of experience for users. Now with the Quest 3, you can take advantage of AV1, an even newer codec that outperforms HEVC. On average, our testing has shown that you can maintain equivalent quality while using around 30% lower bitrate with AV1. This will depend on the type of content you’re working with, so if you’re experimenting with AV1 for the Quest 3, choosing a bitrate that’s ~25% lower than your HEVC encoding would be a good place to start.  DEOVR shared a 2900p sample .mp4 file encoded with AV1, but you can also create your own with a Bitmovin trial account.

Ready to start encoding your own 360 content for Meta Quest headsets? Sign up for a free trial and get going today! 

Related links:

Bitmovin Docs – Encoding Tutorials | Per-Title Configuration Options explained

Bitmovin Player 360 video demo

Gabriel Dávila Revelo

Gabriel Dávila Revelo

Solutions Engineer

Gabriel Davila Revelo is a Solutions Engineer at Bitmovin, his main interests include video streaming and broadcast technologies but he is also an enthusiast of networking systems and protocols. He received his M. S. in Telecommunication Engineering from Buenos Aires University.


Related Posts

Join the conversation