WorldGen – Text to Immersive 3D Worlds

52 minutes ago 1

Imagine being able to type out a simple prompt like “cartoon medieval village” or “sci-fi base station on Mars” and generate an interactive 3D world within minutes. This world would be stylistically and thematically cohesive — no mid-century modern architecture in your Mars base, no Victorian furniture in your medieval village. It would also be sound, with different areas connected in such a way to allow characters to roam freely without getting stuck. A few years ago, that might have sounded like science fiction, but with recent developments in generative AI technologies, people are already producing compelling short film clips based on a single text or image prompt. And now, we’re sharing groundbreaking new research that results in fully navigable, interactive 3D worlds that you can actually walk around and explore.

Today, we’re introducing WorldGen: a state-of-the-art end-to-end system for generating interactive and navigable 3D worlds from a single text prompt. WorldGen is built on a combination of procedural reasoning, diffusion-based 3D generation, and object-aware scene decomposition. The result is geometrically consistent, visually rich, and render-efficient 3D worlds for gaming, simulation, and immersive social environments.

We’ve seen great strides in the use of generative AI to produce high-quality 3D assets based on text and/or image prompts. WorldGen combines and innovates upon a number of existing 2D and 3D generation technologies: First, WorldGen generates an image of the 3D scene, followed by image-to-3D reconstruction, all of which occurs across various stages:

  1. Planning
    1. Procedural blockout generation
    2. Navmesh extraction
    3. Reference image generation
  2. Reconstruction
    1. Image-to-3D base model
    2. Navmesh-based scene generation
    3. Initial scene texture generation
  3. Decomposition
    1. Part extraction with accelerated AutoPartGen for scenes
    2. Data curation for scene decomposition
  4. Refinement
    1. Image enhancement
    2. Mesh refinement model
    3. Texturing model

Other methods out there generate interactive 3D worlds from an image or text prompt based on a single specified viewpoint and building out from there rather than conditioning on a global reference image or full layout. While the geometry and textures near the central viewpoint are high quality, they quickly begin to degrade when you move just 3 – 5 meters away. Comparatively, WorldGen can generate fully textured scenes that span 50 x 50 meters, maintaining stylistic and geometric integrity throughout. And we’re targeting larger world sizes in the future.

While this work is still in the research phase and not available to developers, the content generated by WorldGen is compatible with standard game engines including Unity and Unreal without the need for additional conversions or rendering pipelines.

Although WorldGen has furthered our research in the direction of generating diverse, interactive, and navigable worlds, the current model has limitations that we’re working to address. For example, future versions of WorldGen will be able to generate larger spaces and lower the generation latency.

The creation of 3D content is complex, time-consuming, and — quite frankly — out of many people’s reach. WorldGen shows the potential for considerable time and cost savings across industries while helping to democratize 3D content creation. This supports the vision we shared at Connect for a future where anyone will be able to build entire virtual worlds without ever touching a line of code.

Acknowledgements

Thank you to the following individuals who made this work possible:

Dilin Wang†, Hyunyoung Jung, Tom Monnier, Kihyuk Sohn, Chuhang Zou, Xiaoyu Xiang, Yu-Ying Yeh, Di Liu, Zixuan Huang, Thu Nguyen-Phuoc, Yuchen Fan, Sergiu Oprea, Ziyan Wang, Roman Shapovalov, Nikolaos Sarafianos, Thibault Groueix, Antoine Toisoul, Prithviraj Dhar, Xiao Chu, Minghao Chen, Geon Yeong Park, Mahima Gupta, Yassir Azziz, Milton Cadogan, Christopher Ocampo, Sandy Kao, Rakesh Ranjan†, Andrea Vedaldi†

†project lead

Read Entire Article