Meta researchers turn single images into 3D environments for VR headsets

Meta researchers turn single images into 3D environments for VR headsets

Meta researchers demonstrate how generative AI can create lifelike VR environments from a single image.

Ad
Ad

Imagine being able to create a 3D environment from a single image and then explore it with Meta Quest or use it as a home environment.

A team of researchers at Reality Labs Zurich wants to make this vision a reality and presents a new approach to make it happen.

Although existing generative AI models can create videos from a single image, they have difficulty generating fully immersive scenes, according to the research paper published last week.

The pipeline developed by the researchers is said to outperform state-of-the-art, video synthesis-based methods along multiple quantitative image quality metrics, while requiring minimal training effort and running on existing generative models.

Ad
Ad

"Our key insight is that the task of generating a 3D environment from a single image, which is inherently complex and ambiguous, can be decomposed into a series of more manageable sub-problems, each of which can be addressed with existing techniques," write the research team of Katja Schwarz, Denis Rozumny, Samuel Rota Bulo, Lorenzo Porzi, and Peter Kontschieder.

How a single image becomes a 6-DoF VR environment

The researchers explain their approach as follows: "Our process involves two steps: generating coherent panoramas using a pre-trained diffusion model and lifting these into 3D with a metric depth estimator. We then fill unobserved regions by conditioning the inpainting model on rendered point clouds, requiring minimal fine-tuning."

The four steps of generative AI.

The individual steps to creating the 3D environments. | Image: Meta Reality Labs

The result is a 3D environment rendered using Gaussian splatting that can be viewed and navigated within a 2-meter (6.5 feet) cube on a VR headset.

The method works with both synthetic images and photographs. Even textual descriptions of a scene can be used as input to generate high-quality 3D environments suitable for VR headsets.

logo

Ad
Ad

The research paper also mentions some limitations and challenges. For example, it is difficult to extend the navigable area beyond two meters, as this greatly increases the complexity of the task. Also, the pipeline does not yet support real-time scene synthesis. However, once the Gaussian splats environment is created, it can be displayed in real time on a VR device, the research team writes.

It is not known when such technology might be incorporated into Quest products. However, commercialization does not seem too far off.

You can find more video examples on this page.

What do you think of this technology? Join the conversation on Facebook, Bluesky or X or share your opinion in the comments below.

Ad
Ad

For feedback, topic suggestions, or other ideas, please email us at hello@mixed-news.com.