ChatARKit: ChatGPT programs AR app using natural language alone

ChatARKit: ChatGPT programs AR app using natural language alone

Humanity has been exploring the depths of OpenAI’s ChatGPT neural network since early December. One developer got the dialog AI to spit out working AR code.

OpenAI’s ChatGPT dialog AI is optimized for generating texts and answering questions. But initial tests from early December quickly showed that there’s more to the system than just a few neatly worded sentences. Programming code, for example.

ChatARKit – AR app generated by ChatGPT

Developer Bart Trzynadlowski wanted to find out if he could use ChatGPT to develop an AR app that autonomously places digital 3D objects in the environment using only voice commands. He also recognizes the voice commands using an AI model – OpenAI’s Whipser – and then brings them into the JavaScript environment of the ChatARKit app as an AI prompt.

As a result, ChatGPT selects 3D objects from Sketchfab that match the voice command and places them on the desktop or floor as prompted. If you prompt it, ChatGPT even scales and rotates the 3D models. The AI system generates the code for this on its own.

These are some working sample prompts according to Trzynadlowski:

  • “Place a cube on the nearest plane.”
  • “Place a spinning cube on the floor.”
  • “Place a sports car on the table and rotate it 90 degrees.”
  • “Place a school bus on the nearest plane and make it drive back and forth along the surface.”

According to Trzynadlowski, ChatGPT does not work reliably. For identical commands, the AI model generates very different output and places incorrect JavaScript code lines in the app. Occasionally, ChatGPT turns object descriptions into code identifiers, which means that the 3D models can no longer be retrieved from Sketchfab.

Trzynadlowski makes his ChatGPT AR app available for free as open source on Github.

logo

Generate 3D objects in VR with natural language

For VR, developer Jasmine Roberts recently demonstrated an implementation of OpenAI’s new 3D AI Point-E: like the image AI DALL-E 2, it can generate content based solely on text input. Instead of images, however, Point-E generates 3D point clouds that represent a 3D model. Per generation, Point-E takes only about one to two minutes on a single Nvidia V100 GPU. Roberts’ demo runs in real time.

Point-E is a starting point for OpenAI for further work in text-to-3D synthesis. Google with Dreamfusion or Nvidia with Magic3D also recently introduced text-to-3D systems, which could play an important role in the further spread of 3D content – a fundamental assumption of the metaverse thesis – in the future.