Meta: Photorealistic VR avatars clear important hurdle

Meta: Photorealistic VR avatars clear important hurdle

Photorealistic digital encounters are supposed to be the killer app of VR and AR. Now Meta's codec avatars are taking another hurdle.

The company has been sharing glimpses of its own avatar research for more than three years. The Codec avatars are supposed to replace video conferences one day and usher in a new era of media-mediated communication.

This is how the system works: A VR headset records the eye and mouth area with integrated sensors. Then, an AI model uses this data to generate a lifelike image of the person, including eye movements, facial expressions, and teeth.

Personen müssen sich für einen Ganzkörperavatar zuerst in einem 3D-Studio von Dutzenden Kameras einscannen lassen.

A major obstacle on the road to commercialization: people must first have themselves scanned in a 3D studio for a full-body avatar. | Image: Meta

Since first unveiling the codec avatars in 2019, Meta has continued to show progress in advancing the system. For example, the research group recently succeeded in displaying the virtual reality avatars relatively smoothly on a Meta Quest 2 using a new rendering technique. And, thanks to a new chip developed in-house, Meta was able to further optimize the system.

Face scanning with a smartphone

Metaverse telephony could eventually become available for AR headsets and cast people as holograms in the environment. However, this technology is still five "miracles" away from market readiness, according to research director Yaser Sheikh. Now, the team has apparently achieved another miracle.

One of the biggest hurdles to commercializing the avatar system is that users must first go to a special studio and be scanned in 3D by dozens of cameras. This provides the basic data for the AI model creating the realistic avatars. This is too cumbersome for commercial use.

In their latest research, the team shows that 3D scanning of the face is also possible with a smartphone without sacrificing results. The hope is that one day users will be able to do this independently from home.


Avatar rendering still takes hours

An article by UploadVR describes the technical requirements. According to the article, 3D scanning requires a depth sensor like the one used by the iPhone X and newer for FaceID. According to Meta, the process takes about three and a half minutes. Users have to imitate 65 different facial expressions.

An optimized AI model that requires with less data allows this reduction in complexity. Researchers trained the AI model with a set of 255 faces.

Vergleich: Ein mit Mugsy erstellter Avatare und ein mit einem Smartphone ersteller Avatar.

Above is the result of the 3D studio scan, below is the result of a smartphone scan. | Image: Meta

Another problem yet to be cracked is rendering the avatar in real time and in full detail. Currently, the calculation takes six hours on a computer with four high-end graphics cards. One problem with smartphone scanning is that it has trouble with long hair. Further, this kind of scan provides no information about the rest of the body. So there are still many miracles to be worked out for Meta's researchers.

The research paper is freely available on the Internet.

Sources: UploadVR