A Curated List of Works in World Modeling

16 hours ago 1

Major updates and announcements are shown below. Scroll for full timeline.

🗺️ [2025-10] Enhanced Visual Navigation — Introduced badge system for papers! All entries now display arXiv Website Code for quick access to resources.

🔥 [2025-10] Repository Launch — Awesome World Models is now live! We're building a comprehensive collection spanning Embodied AI, Autonomous Driving, NLP, and more. See CONTRIBUTING.md for how to contribute.

💡 [Ongoing] Community Contributions Welcome — Help us maintain the most up-to-date world models resource! Submit papers via PR or contact us at email.

[Ongoing] Support This Project — If you find this useful, please cite our work and give us a star. Share with your research community!



World Models have become a hot topic in both research and industry, attracting unprecedented attention from the AI community and beyond. However, due to the interdisciplinary nature of the field (and because the term "world model" simply sounds amazing), the concept has been used with varying definitions across different domains.

Awesome World Models

This repository aims to:

  • 🔍 Organize the rapidly growing body of world model research across multiple application domains
  • 🗺️ Provide a minimalist map of how world models are utilized in different fields (Embodied AI, Autonomous Driving, NLP, etc.)
  • 🤝 Bridge the gap between different communities working on world models with varying perspectives
  • 📚 Serve as a one-stop resource for researchers, practitioners, and enthusiasts interested in world modeling
  • 🚀 Track the latest developments and breakthroughs in this exciting field

Whether you're a researcher looking for related work, a practitioner seeking implementation references, or simply curious about world models, we hope this curated list helps you navigate the landscape!


Definition of World Models

While world models' outreach has been expanded again and again, it is widely adopted that the original sources of world models come from these two papers:

Some other great blogposts on world models include:


1. World Models and Video Generation:

2. World Models and 3D Generation:

3. World Models and Embodied Artificial Intelligence:

4. World Models for Autonomous Driving:


World Models for Game Simulation

Pixel Space:

  • [⭐️] GameNGen, "Diffusion Models Are Real-Time Game Engines". arXiv
  • [⭐️] DIAMOND, "Diffusion for World Modeling: Visual Details Matter in Atari". arXiv Code
  • MineWorld, "MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft". arXiv Website
  • Oasis, "Oasis: A Universe in a Transformer". Website
  • AnimeGamer, "AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction". arXivWebsite
  • [⭐️] Matrix-Game, "Matrix-Game: Interactive World Foundation Model." arXiv Code
  • [⭐️] Matrix-Game 2.0, Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model. arXiv Website
  • RealPlay, "From Virtual Games to Real-World Play". arXiv Website Code
  • GameFactory, "GameFactory: Creating New Games with Generative Interactive Videos". arXiv Website Code
  • WORLDMEM, "Worldmem: Long-term Consistent World Simulation with Memory". arXiv Website Code

3D Mesh Space:


World Models for Autonomous Driving

Refer to https://github.com/LMD0311/Awesome-World-Model for full list.

Note

📢 [Call for Maintaince] The repo creator is no expert of autonomous driving, so this is a more-than-concise list of works without classification. We anticipate community effort on turning this section cleaner and more well-sorted.

  • PWM, "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction". arXiv Code

  • Dream4Drive, "Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks". arXiv Website

  • SparseWorld, "SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries". arXiv Code

  • DriveVLA-W0: "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving". arXiv Code

  • "Enhancing Physical Consistency in Lightweight World Models". arXiv

  • IRL-VLA: "IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model". arXiv Website Code

  • LiDARCrafter: "LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences". arXiv Website Code

  • FASTopoWM: "FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models". arXiv Code

  • Orbis: "Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models". arXiv Code

  • "World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving". arXiv

  • NRSeg: "NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models" arXiv Code

  • World4Drive: "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model". arXiv Code

  • Epona: "Epona: Autoregressive Diffusion World Model for Autonomous Driving". arXiv Code

  • "Towards foundational LiDAR world models with efficient latent flow matching". arXiv

  • SceneDiffuser++: "SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model". arXiv

  • COME: "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model" arXiv Code

  • STAGE: "STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation". arXiv

  • ReSim: "ReSim: Reliable World Simulation for Autonomous Driving". arXiv Code Website

  • "Ego-centric Learning of Communicative World Models for Autonomous Driving". arXiv

  • Dreamland: "Dreamland: Controllable World Creation with Simulator and Generative Models". arXiv Website

  • LongDWM: "LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model". arXiv Website

  • GeoDrive: "GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control". arXiv Code

  • FutureSightDrive: "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving". arXiv Code

  • Raw2Drive: "Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)". arXiv

  • VL-SAFE: "VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving". arXiv Website

  • PosePilot: "PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth". arXiv

  • "World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks". arXiv

  • "Learning to Drive from a World Model". arXiv

  • DriVerse: "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment". arXiv

  • "End-to-End Driving with Online Trajectory Evaluation via BEV World Model". arXiv Code

  • "Knowledge Graphs as World Models for Semantic Material-Aware Obstacle Handling in Autonomous Vehicles". arXiv

  • MiLA: "MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving". arXiv Website

  • SimWorld: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model". arXiv Website

  • UniFuture: "Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception". arXiv Website

  • EOT-WM: "Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space". arXiv

  • "Temporal Triplane Transformers as Occupancy World Models". arXiv

  • InDRiVE: "InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model". arXiv

  • MaskGWM: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction". arXiv

  • Dream to Drive: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models". arXiv

  • "Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving". arXiv

  • "Dream to Drive with Predictive Individual World Model". arXiv Code

  • HERMES: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation". arXiv

  • AdaWM: "AdaWM: Adaptive World Model based Planning for Autonomous Driving". arXiv

  • AD-L-JEPA: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data". arXiv

  • DrivingWorld: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT". arXiv Code Website

  • DrivingGPT: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers". arXiv Website

  • "An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training". arXiv

  • GEM: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control". arXiv Website

  • GaussianWorld: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction". arXiv Code

  • Doe-1: "Doe-1: Closed-Loop Autonomous Driving with Large World Model". arXiv Website Code

  • "Pysical Informed Driving World Model". arXiv Website

  • InfiniCube: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models". arXiv Website

  • InfinityDrive: "InfinityDrive: Breaking Time Limits in Driving World Models". arXiv Website

  • ReconDreamer: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration". arXiv Website

  • Imagine-2-Drive: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles". arXiv Website

  • DynamicCity: "DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes". arXiv Website Code

  • DriveDreamer4D: "World Models Are Effective Data Machines for 4D Driving Scene Representation". arXiv Website

  • DOME: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model". arXiv Website

  • SSR: "Does End-to-End Autonomous Driving Really Need Perception Tasks?". arXiv Code

  • "Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models". arXiv

  • LatentDriver: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving". arXiv Code

  • RenderWorld: "World Model with Self-Supervised 3D Label". arXiv

  • OccLLaMA: "An Occupancy-Language-Action Generative World Model for Autonomous Driving". arXiv

  • DriveGenVLM: "Real-world Video Generation for Vision Language Model based Autonomous Driving". arXiv

  • Drive-OccWorld: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving". arXiv

  • CarFormer: "Self-Driving with Learned Object-Centric Representations". arXiv Code

  • BEVWorld: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space". arXiv Code

  • TOKEN: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving". arXiv

  • UMAD: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving". arXiv

  • SimGen: "Simulator-conditioned Driving Scene Generation". arXiv Code

  • AdaptiveDriver: "Planning with Adaptive World Models for Autonomous Driving". arXiv Code

  • UnO: "Unsupervised Occupancy Fields for Perception and Forecasting". arXiv Code

  • LAW: "Enhancing End-to-End Autonomous Driving with Latent World Model". arXiv Code

  • Delphi: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation". arXiv Code

  • OccSora: "4D Occupancy Generation Models as World Simulators for Autonomous Driving". arXiv Code

  • MagicDrive3D: "Controllable 3D Generation for Any-View Rendering in Street Scenes". arXiv Code

  • Vista: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability". arXiv Code

  • CarDreamer: "Open-Source Learning Platform for World Model based Autonomous Driving". arXiv Code

  • DriveSim: "Probing Multimodal LLMs as World Models for Driving". arXiv Code

  • DriveWorld: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving". arXiv

  • LidarDM: "Generative LiDAR Simulation in a Generated World". arXiv Code

  • SubjectDrive: "Scaling Generative Data in Autonomous Driving via Subject Control". arXiv Website

  • DriveDreamer-2: "LLM-Enhanced World Models for Diverse Driving Video Generation". arXiv Code

  • Think2Drive: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving". arXiv

  • MARL-CCE: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model". arXiv Code

  • GenAD: "Generalized Predictive Model for Autonomous Driving". arXiv Website

  • GenAD: "Generative End-to-End Autonomous Driving". arXiv Code

  • NeMo: "Neural Volumetric World Models for Autonomous Driving". arXiv

  • MARL-CCE: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model". Code

  • ViDAR: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving". arXiv Code

  • Drive-WM: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving". arXiv Code

  • Cam4DOCC: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications". arXiv Code

  • Panacea: "Panoramic and Controllable Video Generation for Autonomous Driving". arXiv Code

  • OccWorld: "Learning a 3D Occupancy World Model for Autonomous Driving". arXiv Code

  • DrivingDiffusion: "Layout-Guided multi-view driving scene video generation with latent diffusion model". arXiv Code

  • SafeDreamer: "Safe Reinforcement Learning with World Models". arXiv Code

  • MagicDrive: "Street View Generation with Diverse 3D Geometry Control". arXiv Code

  • DriveDreamer: "Towards Real-world-driven World Models for Autonomous Driving". arXiv Code

  • SEM2: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model". arXiv

World Models for Embodied AI

1. Foundation Embodied World Models

2. World Models for Manipulation

  • [⭐️] FLARE, "FLARE: Robot Learning with Implicit World Modeling". arXiv Website
  • [⭐️] Enerverse, "EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation". arXiv Website
  • [⭐️] AgiBot-World, "AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems". arXiv Website Code
  • [⭐️] DyWA: "DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation" arXiv Website
  • [⭐️] TesserAct, "TesserAct: Learning 4D Embodied World Models". arXiv Website
  • [⭐️] DreamGen: "DreamGen: Unlocking Generalization in Robot Learning through Video World Models". arXiv Code
  • [⭐️] HiP, "Compositional Foundation Models for Hierarchical Planning". arXiv Website
  • PAR: "Physical Autoregressive Model for Robotic Manipulation without Action Pretraining". arXiv Website
  • iMoWM: "iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation". arXiv Website
  • WristWorld: "WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation". arXiv
  • "A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models". arXiv
  • EMMA: "EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer". arXiv
  • PhysTwin, "PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos". arXiv Website Code
  • [⭐️] KeyWorld: "KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models". arXiv
  • World4RL: "World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation". arXiv
  • [⭐️] SAMPO: "SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models". arXiv
  • PhysicalAgent: "PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models". arXiv
  • "Empowering Multi-Robot Cooperation via Sequential World Models". arXiv
  • [⭐️] "Learning Primitive Embodied World Models: Towards Scalable Robotic Learning". arXiv Website
  • [⭐️] GWM: "GWM: Towards Scalable Gaussian World Models for Robotic Manipulation". arXiv Website
  • [⭐️] Flow-as-Action, "Latent Policy Steering with Embodiment-Agnostic Pretrained World Models". arXiv
  • EmbodieDreamer: "EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling". arXiv Website
  • RoboScape: "RoboScape: Physics-informed Embodied World Model". arXiv Code
  • FWM, "Factored World Models for Zero-Shot Generalization in Robotic Manipulation". arXiv
  • [⭐️] ParticleFormer: "ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation". arXiv Website
  • ManiGaussian++: "ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model". arXiv Code
  • ReOI: "Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control". arXiv
  • GAF: "GAF: Gaussian Action Field as a Dynamic World Model for Robotic Mlanipulation". arXiv Website
  • "Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins". arXiv Website
  • "Time-Aware World Model for Adaptive Prediction and Control". arXiv
  • [⭐️] 3DFlowAction: "3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model". arXiv
  • [⭐️] ORV: "ORV: 4D Occupancy-centric Robot Video Generation". arXiv Code Website
  • [⭐️] WoMAP: "WoMAP: World Models For Embodied Open-Vocabulary Object Localization". arXiv
  • "Sparse Imagination for Efficient Visual World Model Planning". arXiv
  • [⭐️] OSVI-WM: "OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation". arXiv
  • [⭐️] LaDi-WM: "LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation". arXiv
  • FlowDreamer: "FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation". arXiv Website
  • PIN-WM: "PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation". arXiv
  • RoboMaster, "Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control". arXiv Website Code
  • ManipDreamer: "ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance". arXiv
  • [⭐️] AdaWorld: "AdaWorld: Learning Adaptable World Models with Latent Actions" arXiv Website
  • "Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks" arXiv Website
  • [⭐️] EVA: "EVA: An Embodied World Model for Future Video Anticipation". arXiv Website
  • "Representing Positional Information in Generative World Models for Object Manipulation". arXiv
  • DexSim2Real$^2$: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation". arXiv
  • "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics". arXiv Website
  • [⭐️] LUMOS: "LUMOS: Language-Conditioned Imitation Learning with World Models". arXiv Website
  • [⭐️] "Object-Centric World Model for Language-Guided Manipulation" arXiv
  • [⭐️] DEMO^3: "Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning" arXiv Website
  • "Strengthening Generative Robot Policies through Predictive World Modeling". arXiv Website
  • RoboHorizon: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation. arXiv
  • Dream to Manipulate: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination". arXiv Website
  • [⭐️] RoboDreamer: "RoboDreamer: Learning Compositional World Models for Robot Imagination". arXiv Code
  • ManiGaussian: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation". arXiv Code
  • [⭐️] WHALE: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making". arXiv
  • [⭐️] VisualPredicator: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning". arXiv
  • [⭐️] "Multi-Task Interactive Robot Fleet Learning with Visual World Models". arXiv Code
  • PIVOT-R: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation". arXiv
  • Video2Action, "Grounding Video Models to Actions through Goal Conditioned Exploration". arXiv Website Code
  • Diffuser, "Planning with Diffusion for Flexible Behavior Synthesis". arXiv
  • Decision Diffuser, "Is Conditional Generative Modeling all you need for Decision-Making?". arXiv
  • Potential Based Diffusion Motion Planning, "Potential Based Diffusion Motion Planning". arXiv

3. World Models for Navigation

4. World Models for Locomotion

Locomotion:

Loco-Manipulation:

Unifying World Models and VLAs in one model:

Combining World Models and VLAs:

6. World Models x Policy Learning

This subsection focuses on general policy learning methods in embodied intelligence via leveraging world models.

7. World Models for Policy evaluation

Real-world policy evaluation is expensive and noisy. The promise of world models is by accurately capturing environment dynamics, it can serve as a surrogate evaluation environment with high correlation to the policy performance in the real world. Before world models, the role for that was simulators:

For World Model Evaluation:


Natural Science:

Social Science:

Positions on World Models

Theory & World Models Explainability

  • [⭐️] General agents Contain World Models, "General agents contain world models". arXiv
  • [⭐️] When Do Neural Networks Learn World Models? "When Do Neural Networks Learn World Models?" arXiv
  • What Does it Mean for a Neural Network to Learn a 'World Model'?, "What Does it Mean for a Neural Network to Learn a 'World Model'?". arXiv
  • Transformer cannot learn HMMs (sometimes) "On Limitation of Transformer for Learning HMMs". arXiv
  • [⭐️] Inductive Bias Probe, "What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models". arXiv
  • [⭐️] Dynamical Systems Learning for World Models, "When do World Models Successfully Learn Dynamical Systems?". arXiv
  • How Hard is it to Confuse a World Model?, "How Hard is it to Confuse a World Model?". arXiv
  • ICL Emergence, "Context and Diversity Matter: The Emergence of In-Context Learning in World Models". arXiv
  • [⭐️] Scaling Law,"Scaling Laws for Pre-training Agents and World Models". arXiv
  • LLM World Model, "Linear Spatial World Models Emerge in Large Language Models". arXiv Code
  • Revisiting Othello, "Revisiting the Othello World Model Hypothesis". arXiv
  • [⭐️] Transformers Use Causal World Models, "Transformers Use Causal World Models in Maze-Solving Tasks". arXiv
  • [⭐️] Causal World Model inside NTP, "A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment". arXiv

General Approaches to World Models

1. Foundation World Models

Interactive Video Generation:

3D Scene Generation:

Genie Series:

V-JEPA Series:

Cosmos Series:

World-Lab Projects:

  • Generating Worlds, "Generating Worlds". Blog

Other Awesome Models:

2. Building World Models from 2D Vision Priors

The represents a "bottom-up" approach to achieving intelligence, sensorimotor before abstraction. In the 2D pixel space, world models often build upon pre-existing image/video generation approaches.

To what extent does Vision Intelligence exist in Video Generation Models:

Useful Approaches in Video Generation:

From Video Generation Models to World Models:

Pixel Space World Models:

3. Building World Models from 3D Vision Priors

3D Mesh is also a useful representaiton of the physical world, including benefits such as spatial consistency.

4. Building World Models from Language Priors

The represents a "top-down" approach to achieving intelligence, abstraction before sensorimotor.

Aiming to Advance LLM/VLM skills:

Aiming to enhance computer-use agent performance:

Symbolic World Models:

LLM-in-the-loop World Generation:

5. Building World Models by Bridging Language and Vision Intelligence

A recent trend of work is bridging highly-compressed semantic tokens (e.g. language) with information-sparse cues in the observation space (e.g. vision). This results in World Models that combine high-level and low-level intelligence.

6. Latent Space World Models

While learning in the observation space (pixel, 3D mesh, language, etc.) is a common approach, for many applications (planning, policy evaluation, etc.) learning in latent space is sufficient or is believed to lead to even better performace.

JEPA is a special kind of learning in latent space, where the loss is put on the latent space, and the encoder & predictor are co-trained. However, the usage of JEPA is not only in world models (e.g. V-JEPA2-AC), but also representation learning (e.g. I-JEPA, V-JEPA), we provide representative works from both perspectives below.

7. Building World Models from an Object-Centric Perspective

  • Object-Centric Latent Action Learning: "Object-Centric Latent Action Learning". Website
  • Unifying Causal and Object-centric Representation Learning: "Unifying Causal and Object-centric Representation Learning allows Causal Composition". Website
  • Object-Centric Representations: "Object-Centric Representations Generalize Better Compositionally with Less Compute". Website
  • Object-Centric Latent Action Learning: "Object-Centric Latent Action Learning". Website

7. Post-training and Inference-Time Scaling for World Models

8. World Models in the context of Model-Based RL

A significant porportion of World Model Algorithms and Techniques stem from the advances in Model-based Reinforcement Learning in the era around 2020. Dreamer(v1-v3) are classical works in this era. We provide a list of classical works as well as works following this line of thought.

  • [⭐️] Dreamer, "Dream to Control: Learning Behaviors by Latent Imagination". arXiv Code Website
  • [⭐️] Dreamerv2, "Mastering Atari with Discrete World Models". arXiv Code Website
  • [⭐️] Dreamerv3, "Mastering Diverse Domains through World Models". arXiv Code Website
  • DreamSmooth: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing". arXiv
  • [⭐️] TD-MPC2: "TD-MPC2: Scalable, Robust World Models for Continuous Control". arXiv [Torch Code]
  • Hieros: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models". arXiv
  • CoWorld: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning". arXiv
  • HarmonyDream, "HarmonyDream: Task Harmonization Inside World Models". arXiv Code
  • DyMoDreamer, "DyMoDreamer: World Modeling with Dynamic Modulation". arXiv Code
  • "Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization". arXiv
  • PIGDreamer, "PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning". arXiv
  • [⭐️] Continual Reinforcement Learning by Planning with Online World Models, "Continual Reinforcement Learning by Planning with Online World Models". arXiv
  • Δ-IRIS: "Efficient World Models with Context-Aware Tokenization". arXiv Code
  • AD3: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors". arXiv
  • R2I: "Mastering Memory Tasks with World Models". arXiv Website Code
  • REM: "Improving Token-Based World Models with Parallel Observation Prediction". arXiv Code
  • AWM, "Do Transformer World Models Give Better Policy Gradients?"". arXiv
  • [⭐️] Dreaming of Many Worlds, "Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization". arXiv Code
  • PWM: "PWM: Policy Learning with Large World Models". arXiv Code
  • GenRL: "GenRL: Multimodal foundation world models for generalist embodied agents". arXiv Code
  • DLLM: "World Models with Hints of Large Language Models for Goal Achieving". arXiv
  • Adaptive World Models: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity". arXiv
  • "Reward-free World Models for Online Imitation Learning". arXiv
  • MoReFree: "World Models Increase Autonomy in Reinforcement Learning". arXiv Website
  • ROMBRL, "Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning". arXiv
  • "Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning". arXiv
  • [⭐️] MoSim: "Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning". arXiv
  • SENSEI: "SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models". arXiv Website
  • Spiking World Model, "Implementing Spiking World Model with Multi-Compartment Neurons for Model-based Reinforcement Learning". arXiv
  • DCWM, "Discrete Codebook World Models for Continuous Control". arXiv
  • Multimodal Dreaming: "Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning". arXiv
  • "Generalist World Model Pre-Training for Efficient Reinforcement Learning". arXiv
  • "Learning To Explore With Predictive World Model Via Self-Supervised Learning". arXiv
  • Simulus: "Uncovering Untapped Potential in Sample-Efficient World Model Agents". arXiv
  • DMWM: "DMWM: Dual-Mind World Model with Long-Term Imagination". arXiv
  • EvoAgent: "EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks". arXiv
  • GLIMO: "Grounding Large Language Models In Embodied Environment With Imperfect World Models". arXiv
  • Energy-based Transition Models, "Offline Transition Modeling via Contrastive Energy Learning". OpenReview Code
  • PCM, "Policy-conditioned Environment Models are More Generalizable". OpenReview Website Code

9. World models in other modalities

10. Memory in World Model

Implicit Memory:

Explicit Memory:


World Models in the Language Modality:

World Models in the Pixel Space:

World Models in 3D Mesh Space:

World Models in other modalities:

  • "Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving". arXiv

Physically Plausible World Models:

  • Newton: "Newton - A Small Benchmark for Interactive Foundation World Models". Website
  • Text2World: "Text2World: Benchmarking World Modeling Capabilities of Large Language Models via Program Synthesis". Website
  • AetherVision-Bench: "AetherVision-Bench: An Open-Vocabulary RGB-Infrared Benchmark for Multi-Angle Segmentation across Aerial and Ground Perspectives". Website
  • VideoPhy-2: "VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation". Website
  • A Comprehensive Evaluation: "A Comprehensive Evaluation of Physical Realism in Text-to-Video Models". Website
  • ScenePhys: "ScenePhys — Controllable Physics Videos for World-Model Evaluation". Website
  • OpenGVL: "OpenGVL - Benchmarking Visual Temporal Progress for Data Curation". Website

This project is largely built on the foundations laid by:

Huge shoutout the the authors for their awesome work.


If you find this repository useful, please consider citing this list:

@misc{huang2025awesomeworldmodels, title = {Awesome-World-Models}, author = {Siqiao Huang}, journal = {GitHub repository}, url = {https://github.com/knightnemo/Awesome-World-Models}, year = {2025}, }

Star History Chart

Read Entire Article