Skyfall-GS – Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

4 hours ago 1

Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

1National Yang Ming Chiao Tung University 2UIUC 3University of Zaragoza 4UC Merced

TL;DR: Skyfall-GS converts satellite images to explorable 3D urban scenes using diffusion models, with real-time rendering performance.

Synthesizing large-scale, explorable, and geometrically accurate 3D urban scenes is a challenging yet valuable task in providing immersive and embodied applications. The challenges lie in the lack of large-scale and high-quality real-world 3D scans for training generalizable generative models. In this paper, we take an alternative route to create large-scale 3D scenes by synergizing the readily available satellite imagery that supplies realistic coarse geometry and the open-domain diffusion model for creating high-quality close-up appearances. We propose Skyfall-GS, the first city-block scale 3D scene creation framework without costly 3D annotations, also featuring real-time, immersive 3D exploration. We tailor a curriculum-driven iterative refinement strategy to progressively enhance geometric completeness and photorealistic textures. Extensive experiments demonstrate that Skyfall-GS provides improved cross-view consistent geometry and more realistic textures compared to state-of-the-art approaches.

Our method synthesizes immersive and free-flight navigable city-block scale 3D scenes solely from multi-view satellite imagery in two stages.

(a) Reconstruction Stage

  • First, we reconstruct the initial 3D scene using 3DGS, enhanced by pseudo-camera depth supervision to address limited parallax in satellite images.
  • We then use an appearance modeling component to handle varying illumination conditions across multi-date satellite images.

(b) Synthesis Stage

  • Next, we introduce a curriculum-based Iterative Dataset Update (IDU) refinement technique.
  • This technique leverages (c) a pre-trained T2I diffusion model with prompt-to-prompt editing.
  • By iteratively updating training datasets with refined renders, our approach significantly reduces visual artifacts and improves geometric accuracy and texture realism.

⚡️ Interactive 3DGS Viewer ⚡️

Explore our 3D Gaussian Splatting results interactively. Click on the scene buttons below to switch between different urban scenes. Use your mouse to freely navigate within each scene, and use WASD keys for fly navigation. Click the information button in the viewer for more controls.

Residential (JAX_004) H Building (JAX_068) Office Building (JAX_214) Lake Side (JAX_260) City Hall (JAX_164) Villa (JAX_168) Stadium (JAX_175) Factory (JAX_264) World Financial Center (NYC_004) Union Square (NYC_010) E 12th St. (NYC_219) Albany St. (NYC_336)

Acknowledgements

This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2 and 113-2628-EA49-023-. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for their generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.

BibTeX

@article{lee2025SkyfallGS, title = {{Skyfall-GS}: Synthesizing Immersive {3D} Urban Scenes from Satellite Imagery}, author = {Jie-Ying Lee and Yi-Ruei Liu and Shr-Ruei Tsai and Wei-Cheng Chang and Chung-Ho Wu and Jiewen Chan and Zhenjun Zhao and Chieh Hubert Lin and Yu-Lun Liu}, journal = {arXiv preprint}, year = {2025}, eprint = {2510.15869}, archivePrefix = {arXiv} }
Read Entire Article