Single Image –> Multiview Splat

3 months ago 1

VistaDream is a novel framework for reconstructing 3D scenes from single-view images using Flux-based diffusion models. This implementation combines image outpainting, depth estimation, and 3D Gaussian splatting for high-quality 3D scene generation, with integrated visualization using Rerun.

Uses Rerun for 3D visualization, Gradio for interactive UI, Flux for diffusion-based outpainting, and Pixi for easy installation.

badge-github-stars

VistaDream 3D scene reconstruction

VistaDream addresses the challenge of 3D scene reconstruction from a single image through a novel two-stage pipeline:

  1. Coarse 3D Scaffold Construction: Creates a global scene structure by outpainting image boundaries and estimating depth maps
  2. Multi-view Consistency Sampling (MCS): Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views

The framework integrates multiple state-of-the-art models:

  • Flux diffusion models for high-quality image outpainting and inpainting
  • 3D Gaussian Splatting for efficient 3D scene representation
  • Rerun for real-time 3D visualization and debugging
  • Linux only with NVIDIA GPU (CUDA 12.8)
  • Pixi package manager
git clone https://github.com/rerun-io/vistadream.git cd vistadream pixi run example

This will automatically download the required models and run the example with the included office image.

Full VistaDream Pipeline - 3D Scene Reconstruction ⚠️ Under Construction

Generate a complete 3D scene from a single image with outpainting, depth estimation, and Gaussian splatting:

pixi run python tools/run_vistadream.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2 --n-frames 10

Note: The full 3D reconstruction pipeline is currently under active development. Some features may be experimental or incomplete.

Process a single image with depth estimation and basic 3D reconstruction:

pixi run python tools/run_single_img.py --image-path data/office/IMG_4029.jpg

Run just the outpainting component with Rerun visualization:

pixi run python tools/run_flux_outpainting.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2

Launch an interactive web interface for experimenting with the models:

pixi run python tools/gradio_app.py
  • Single Image to 3D: Complete pipeline from single image to navigable 3D scene
  • Memory Efficient: Model offloading support for GPU memory management
  • Real-time Visualization: Integrated Rerun viewer for 3D scene inspection
  • Training-free: No fine-tuning required for existing diffusion models
  • High Quality: Multi-view consistency sampling ensures coherent 3D reconstruction
├── src/vistadream/ │ ├── api/ # High-level pipeline APIs │ │ ├── flux_outpainting.py # Outpainting-only pipeline │ │ └── vistadream_pipeline.py # Full 3D reconstruction pipeline │ ├── flux/ # Flux diffusion model integration │ │ ├── cli_*.py # Command-line interfaces │ │ ├── model.py # Flux transformer architecture │ │ ├── sampling.py # Diffusion sampling logic │ │ └── util.py # Model loading and configuration │ └── ops/ # Core operations │ ├── flux.py # Flux model wrappers │ ├── gs/ # Gaussian splatting implementation │ ├── trajs/ # Camera trajectory generation │ └── visual_check.py # 3D scene validation tools └── tools/ # Standalone applications ├── gradio_app.py # Web interface ├── run_flux_outpainting.py ├── run_vistadream.py # Main 3D pipeline └── run_single_img.py # Single image processing

Models are automatically downloaded from Hugging Face on first run. Manual download:

pixi run huggingface-cli download pablovela5620/vistadream --local-dir ckpt/

Expected structure:

ckpt/ ├── flux_fill/ │ ├── flux1-fill-dev.safetensors │ └── ae.safetensors ├── vec.pt ├── txt.pt └── txt_256.pt

Thanks to the original authors! If you use VistaDream in your research, please cite:

Original Repo

@inproceedings{wang2025vistadream, title={VistaDream: Sampling multiview consistent images for single-view scene reconstruction}, author={Wang, Haiping and Liu, Yuan and Liu, Ziwei and Wang, Wenping and Dong, Zhen and Yang, Bisheng}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, year={2025} }

This project builds upon several outstanding works:

  • ASUKA - Enhanced image inpainting for mitigating unwanted object insertion
  • MoGe - Accurate monocular geometry estimation for open-domain images
Read Entire Article