Nvidia's latest open-source AI generates 3D models from a single 2D image

Nvidia's latest open-source AI generates 3D models from a single 2D image

Nvidia's latest AI model, GET3D, is designed to speed up the creation of 3D content. The model can output many textured 3D meshes in seconds, which can be used seamlessly in standard graphics engines. A single 2D image is all that is needed as input.

GET3D stands for "Generate Explicit Textured 3D" and is a generative 3D model that can synthesize high-quality 3D polygon meshes with any topology. A single 2D image is sufficient as input for the generation process.

The generated polygon meshes consist of textured triangles - a standard format that allows seamless import into 3D programs, game engines, or movie renderers.

The 3D objects are fully editable after import and can be scaled, rotated, and lit, for example. With Nvidia's StyleGAN-Nada, developers can further change the shape or texture of the 3D model using only text commands and thus, for example, transform a conventional car into a police car.

3D model generation from synthetic 2D images

Nvidia's research team has developed a two-step generation process: The geometry branch generates the polygon mesh with any desired topology. The texture branch generates a texture field that can represent colors and, for example, specific materials at the surface points of the polygon mesh.

Finally, as with GA networks, discriminators evaluate the quality of the output based on synthetic photos of the 3D model and continuously optimize it to match the target image.

The training process of GET3D. | Image: Nvidia

GET3D was trained with about one million synthetic 2D images of 3D models from different angles. According to Nvidia, the training took about two days on Nvidia A100 GPUs.

GET3D accelerates the 3D content process

The 3D models that GET3D can generate depend on the training data: For example, if you train the system with synthetic car or animal images, it can generate 3D cars or animals. The larger and more diverse the training data set, the more detailed and diverse the 3D models generated, Nvidia says.

logo

On a single off-the-shelf Nvidia GPU, the model can generate around 20 shapes per second after training, which combine to form a 3D model, according to Nvidia. The generation takes place locally on the user’s computer and is thus independent of content restrictions, such as those known from cloud AI services.

"GET3D brings us a step closer to democratizing AI-powered 3D content creation," says Sanja Fidler, head of Nvidia's research lab in Toronto, where the tool was developed.

One limitation of GET3D, according to Nvidia's research team, is that training is currently only possible with 2D silhouettes of synthetic images from known camera positions. In future versions, advances in camera position estimation could form the basis for training with real images.

Currently, GET3D would also be trained only per category. A cross-category model could increase the variety of 3D models generated and improve the flexibility of the system.

As an open-source model, GET3D is available for free on Github.

Sources: Paper, Nvidia, Github