A Curated List of Works in World Modeling

16 hours ago 1

Major updates and announcements are shown below. Scroll for full timeline.

🗺️ [2025-10] Enhanced Visual Navigation — Introduced badge system for papers! All entries now display for quick access to resources.

🔥 [2025-10] Repository Launch — Awesome World Models is now live! We're building a comprehensive collection spanning Embodied AI, Autonomous Driving, NLP, and more. See CONTRIBUTING.md for how to contribute.

💡 [Ongoing] Community Contributions Welcome — Help us maintain the most up-to-date world models resource! Submit papers via PR or contact us at email.

⭐ [Ongoing] Support This Project — If you find this useful, please cite our work and give us a star. Share with your research community!

🎯 Aim of the project
📚 Definition of World Models
📖 Surveys of World Models
🎮 World Models for Game Simulation
🚗 World Models for Autonomous Driving
🤖 World Models for Embodied AI
🔬 World Models for Science
💭 Positions on World Models
📐 Theory & World Models Explainability
🛠️ General Approaches to World Models
📊 Evaluating World Models
🙏 Acknowledgements
📝 Citation

World Models have become a hot topic in both research and industry, attracting unprecedented attention from the AI community and beyond. However, due to the interdisciplinary nature of the field (and because the term "world model" simply sounds amazing), the concept has been used with varying definitions across different domains.

This repository aims to:

🔍 Organize the rapidly growing body of world model research across multiple application domains
🗺️ Provide a minimalist map of how world models are utilized in different fields (Embodied AI, Autonomous Driving, NLP, etc.)
🤝 Bridge the gap between different communities working on world models with varying perspectives
📚 Serve as a one-stop resource for researchers, practitioners, and enthusiasts interested in world modeling
🚀 Track the latest developments and breakthroughs in this exciting field

Whether you're a researcher looking for related work, a practitioner seeking implementation references, or simply curious about world models, we hope this curated list helps you navigate the landscape!

Definition of World Models

While world models' outreach has been expanded again and again, it is widely adopted that the original sources of world models come from these two papers:

Some other great blogposts on world models include:

1. World Models and Video Generation:

2. World Models and 3D Generation:

3. World Models and Embodied Artificial Intelligence:

4. World Models for Autonomous Driving:

World Models for Game Simulation

Pixel Space:

[⭐️] GameNGen, "Diffusion Models Are Real-Time Game Engines".
[⭐️] DIAMOND, "Diffusion for World Modeling: Visual Details Matter in Atari".
MineWorld, "MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft".
Oasis, "Oasis: A Universe in a Transformer".
AnimeGamer, "AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction".
[⭐️] Matrix-Game, "Matrix-Game: Interactive World Foundation Model."
[⭐️] Matrix-Game 2.0, Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model.
RealPlay, "From Virtual Games to Real-World Play".
GameFactory, "GameFactory: Creating New Games with Generative Interactive Videos".
WORLDMEM, "Worldmem: Long-term Consistent World Simulation with Memory".

3D Mesh Space:

World Models for Autonomous Driving

Refer to https://github.com/LMD0311/Awesome-World-Model for full list.

Note

📢 [Call for Maintaince] The repo creator is no expert of autonomous driving, so this is a more-than-concise list of works without classification. We anticipate community effort on turning this section cleaner and more well-sorted.

PWM, "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction".
Dream4Drive, "Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks".
SparseWorld, "SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries".
DriveVLA-W0: "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving".
"Enhancing Physical Consistency in Lightweight World Models".
IRL-VLA: "IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model".
LiDARCrafter: "LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences".
FASTopoWM: "FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models".
Orbis: "Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models".
"World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving".
NRSeg: "NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models"
World4Drive: "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model".
Epona: "Epona: Autoregressive Diffusion World Model for Autonomous Driving".
"Towards foundational LiDAR world models with efficient latent flow matching".
SceneDiffuser++: "SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model".
COME: "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model"
STAGE: "STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation".
ReSim: "ReSim: Reliable World Simulation for Autonomous Driving".
"Ego-centric Learning of Communicative World Models for Autonomous Driving".
Dreamland: "Dreamland: Controllable World Creation with Simulator and Generative Models".
LongDWM: "LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model".
GeoDrive: "GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control".
FutureSightDrive: "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving".
Raw2Drive: "Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)".
VL-SAFE: "VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving".
PosePilot: "PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth".
"World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks".
"Learning to Drive from a World Model".
DriVerse: "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment".
"End-to-End Driving with Online Trajectory Evaluation via BEV World Model".
"Knowledge Graphs as World Models for Semantic Material-Aware Obstacle Handling in Autonomous Vehicles".
MiLA: "MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving".
SimWorld: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model".
UniFuture: "Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception".
EOT-WM: "Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space".
"Temporal Triplane Transformers as Occupancy World Models".
InDRiVE: "InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model".
MaskGWM: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction".
Dream to Drive: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models".
"Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving".
"Dream to Drive with Predictive Individual World Model".
HERMES: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation".
AdaWM: "AdaWM: Adaptive World Model based Planning for Autonomous Driving".
AD-L-JEPA: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data".
DrivingWorld: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT".
DrivingGPT: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers".
"An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training".
GEM: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control".
GaussianWorld: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction".
Doe-1: "Doe-1: Closed-Loop Autonomous Driving with Large World Model".
"Pysical Informed Driving World Model".
InfiniCube: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models".
InfinityDrive: "InfinityDrive: Breaking Time Limits in Driving World Models".
ReconDreamer: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration".
Imagine-2-Drive: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles".
DynamicCity: "DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes".
DriveDreamer4D: "World Models Are Effective Data Machines for 4D Driving Scene Representation".
DOME: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model".
SSR: "Does End-to-End Autonomous Driving Really Need Perception Tasks?".
"Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models".
LatentDriver: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving".
RenderWorld: "World Model with Self-Supervised 3D Label".
OccLLaMA: "An Occupancy-Language-Action Generative World Model for Autonomous Driving".
DriveGenVLM: "Real-world Video Generation for Vision Language Model based Autonomous Driving".
Drive-OccWorld: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving".
CarFormer: "Self-Driving with Learned Object-Centric Representations".
BEVWorld: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space".
TOKEN: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving".
UMAD: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving".
SimGen: "Simulator-conditioned Driving Scene Generation".
AdaptiveDriver: "Planning with Adaptive World Models for Autonomous Driving".
UnO: "Unsupervised Occupancy Fields for Perception and Forecasting".
LAW: "Enhancing End-to-End Autonomous Driving with Latent World Model".
Delphi: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation".
OccSora: "4D Occupancy Generation Models as World Simulators for Autonomous Driving".
MagicDrive3D: "Controllable 3D Generation for Any-View Rendering in Street Scenes".
Vista: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability".
CarDreamer: "Open-Source Learning Platform for World Model based Autonomous Driving".
DriveSim: "Probing Multimodal LLMs as World Models for Driving".
DriveWorld: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving".
LidarDM: "Generative LiDAR Simulation in a Generated World".
SubjectDrive: "Scaling Generative Data in Autonomous Driving via Subject Control".
DriveDreamer-2: "LLM-Enhanced World Models for Diverse Driving Video Generation".
Think2Drive: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving".
MARL-CCE: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model".
GenAD: "Generalized Predictive Model for Autonomous Driving".
GenAD: "Generative End-to-End Autonomous Driving".
NeMo: "Neural Volumetric World Models for Autonomous Driving".
MARL-CCE: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model".
ViDAR: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving".
Drive-WM: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving".
Cam4DOCC: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications".
Panacea: "Panoramic and Controllable Video Generation for Autonomous Driving".
OccWorld: "Learning a 3D Occupancy World Model for Autonomous Driving".
DrivingDiffusion: "Layout-Guided multi-view driving scene video generation with latent diffusion model".
SafeDreamer: "Safe Reinforcement Learning with World Models".
MagicDrive: "Street View Generation with Diverse 3D Geometry Control".
DriveDreamer: "Towards Real-world-driven World Models for Autonomous Driving".
SEM2: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model".

World Models for Embodied AI

1. Foundation Embodied World Models

2. World Models for Manipulation

[⭐️] FLARE, "FLARE: Robot Learning with Implicit World Modeling".
[⭐️] Enerverse, "EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation".
[⭐️] AgiBot-World, "AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems".
[⭐️] DyWA: "DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation"
[⭐️] TesserAct, "TesserAct: Learning 4D Embodied World Models".
[⭐️] DreamGen: "DreamGen: Unlocking Generalization in Robot Learning through Video World Models".
[⭐️] HiP, "Compositional Foundation Models for Hierarchical Planning".
PAR: "Physical Autoregressive Model for Robotic Manipulation without Action Pretraining".
iMoWM: "iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation".
WristWorld: "WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation".
"A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models".
EMMA: "EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer".
PhysTwin, "PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos".
[⭐️] KeyWorld: "KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models".
World4RL: "World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation".
[⭐️] SAMPO: "SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models".
PhysicalAgent: "PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models".
"Empowering Multi-Robot Cooperation via Sequential World Models".
[⭐️] "Learning Primitive Embodied World Models: Towards Scalable Robotic Learning".
[⭐️] GWM: "GWM: Towards Scalable Gaussian World Models for Robotic Manipulation".
[⭐️] Flow-as-Action, "Latent Policy Steering with Embodiment-Agnostic Pretrained World Models".
EmbodieDreamer: "EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling".
RoboScape: "RoboScape: Physics-informed Embodied World Model".
FWM, "Factored World Models for Zero-Shot Generalization in Robotic Manipulation".
[⭐️] ParticleFormer: "ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation".
ManiGaussian++: "ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model".
ReOI: "Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control".
GAF: "GAF: Gaussian Action Field as a Dynamic World Model for Robotic Mlanipulation".
"Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins".
"Time-Aware World Model for Adaptive Prediction and Control".
[⭐️] 3DFlowAction: "3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model".
[⭐️] ORV: "ORV: 4D Occupancy-centric Robot Video Generation".
[⭐️] WoMAP: "WoMAP: World Models For Embodied Open-Vocabulary Object Localization".
"Sparse Imagination for Efficient Visual World Model Planning".
[⭐️] OSVI-WM: "OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation".
[⭐️] LaDi-WM: "LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation".
FlowDreamer: "FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation".
PIN-WM: "PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation".
RoboMaster, "Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control".
ManipDreamer: "ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance".
[⭐️] AdaWorld: "AdaWorld: Learning Adaptable World Models with Latent Actions"
"Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks"
[⭐️] EVA: "EVA: An Embodied World Model for Future Video Anticipation".
"Representing Positional Information in Generative World Models for Object Manipulation".
DexSim2Real$^2$: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation".
"Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics".
[⭐️] LUMOS: "LUMOS: Language-Conditioned Imitation Learning with World Models".
[⭐️] "Object-Centric World Model for Language-Guided Manipulation"
[⭐️] DEMO^3: "Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning"
"Strengthening Generative Robot Policies through Predictive World Modeling".
RoboHorizon: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation.
Dream to Manipulate: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination".
[⭐️] RoboDreamer: "RoboDreamer: Learning Compositional World Models for Robot Imagination".
ManiGaussian: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation".
[⭐️] WHALE: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making".
[⭐️] VisualPredicator: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning".
[⭐️] "Multi-Task Interactive Robot Fleet Learning with Visual World Models".
PIVOT-R: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation".
Video2Action, "Grounding Video Models to Actions through Goal Conditioned Exploration".
Diffuser, "Planning with Diffusion for Flexible Behavior Synthesis".
Decision Diffuser, "Is Conditional Generative Modeling all you need for Decision-Making?".
Potential Based Diffusion Motion Planning, "Potential Based Diffusion Motion Planning".

3. World Models for Navigation

4. World Models for Locomotion

Locomotion:

Loco-Manipulation:

Unifying World Models and VLAs in one model:

Combining World Models and VLAs:

6. World Models x Policy Learning

This subsection focuses on general policy learning methods in embodied intelligence via leveraging world models.

7. World Models for Policy evaluation

Real-world policy evaluation is expensive and noisy. The promise of world models is by accurately capturing environment dynamics, it can serve as a surrogate evaluation environment with high correlation to the policy performance in the real world. Before world models, the role for that was simulators:

For World Model Evaluation:

Natural Science:

Social Science:

Positions on World Models

Theory & World Models Explainability

[⭐️] General agents Contain World Models, "General agents contain world models".
[⭐️] When Do Neural Networks Learn World Models? "When Do Neural Networks Learn World Models?"
What Does it Mean for a Neural Network to Learn a 'World Model'?, "What Does it Mean for a Neural Network to Learn a 'World Model'?".
Transformer cannot learn HMMs (sometimes) "On Limitation of Transformer for Learning HMMs".
[⭐️] Inductive Bias Probe, "What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models".
[⭐️] Dynamical Systems Learning for World Models, "When do World Models Successfully Learn Dynamical Systems?".
How Hard is it to Confuse a World Model?, "How Hard is it to Confuse a World Model?".
ICL Emergence, "Context and Diversity Matter: The Emergence of In-Context Learning in World Models".
[⭐️] Scaling Law,"Scaling Laws for Pre-training Agents and World Models".
LLM World Model, "Linear Spatial World Models Emerge in Large Language Models".
Revisiting Othello, "Revisiting the Othello World Model Hypothesis".
[⭐️] Transformers Use Causal World Models, "Transformers Use Causal World Models in Maze-Solving Tasks".
[⭐️] Causal World Model inside NTP, "A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment".

General Approaches to World Models

1. Foundation World Models

Interactive Video Generation:

3D Scene Generation:

Genie Series:

V-JEPA Series:

Cosmos Series:

World-Lab Projects:

Generating Worlds, "Generating Worlds".

Other Awesome Models:

2. Building World Models from 2D Vision Priors

The represents a "bottom-up" approach to achieving intelligence, sensorimotor before abstraction. In the 2D pixel space, world models often build upon pre-existing image/video generation approaches.

To what extent does Vision Intelligence exist in Video Generation Models:

Useful Approaches in Video Generation:

From Video Generation Models to World Models:

Pixel Space World Models:

3. Building World Models from 3D Vision Priors

3D Mesh is also a useful representaiton of the physical world, including benefits such as spatial consistency.

4. Building World Models from Language Priors

The represents a "top-down" approach to achieving intelligence, abstraction before sensorimotor.

Aiming to Advance LLM/VLM skills:

Aiming to enhance computer-use agent performance:

Symbolic World Models:

LLM-in-the-loop World Generation:

5. Building World Models by Bridging Language and Vision Intelligence

A recent trend of work is bridging highly-compressed semantic tokens (e.g. language) with information-sparse cues in the observation space (e.g. vision). This results in World Models that combine high-level and low-level intelligence.

6. Latent Space World Models

While learning in the observation space (pixel, 3D mesh, language, etc.) is a common approach, for many applications (planning, policy evaluation, etc.) learning in latent space is sufficient or is believed to lead to even better performace.

JEPA is a special kind of learning in latent space, where the loss is put on the latent space, and the encoder & predictor are co-trained. However, the usage of JEPA is not only in world models (e.g. V-JEPA2-AC), but also representation learning (e.g. I-JEPA, V-JEPA), we provide representative works from both perspectives below.

7. Building World Models from an Object-Centric Perspective

Object-Centric Latent Action Learning: "Object-Centric Latent Action Learning".

Unifying Causal and Object-centric Representation Learning: "Unifying Causal and Object-centric Representation Learning allows Causal Composition".

Object-Centric Representations: "Object-Centric Representations Generalize Better Compositionally with Less Compute".

Object-Centric Latent Action Learning: "Object-Centric Latent Action Learning".

7. Post-training and Inference-Time Scaling for World Models

8. World Models in the context of Model-Based RL

A significant porportion of World Model Algorithms and Techniques stem from the advances in Model-based Reinforcement Learning in the era around 2020. Dreamer(v1-v3) are classical works in this era. We provide a list of classical works as well as works following this line of thought.

[⭐️] Dreamer, "Dream to Control: Learning Behaviors by Latent Imagination".
[⭐️] Dreamerv2, "Mastering Atari with Discrete World Models".
[⭐️] Dreamerv3, "Mastering Diverse Domains through World Models".
DreamSmooth: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing".
[⭐️] TD-MPC2: "TD-MPC2: Scalable, Robust World Models for Continuous Control". [Torch Code]
Hieros: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models".
CoWorld: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning".
HarmonyDream, "HarmonyDream: Task Harmonization Inside World Models".
DyMoDreamer, "DyMoDreamer: World Modeling with Dynamic Modulation".
"Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization".
PIGDreamer, "PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning".
[⭐️] Continual Reinforcement Learning by Planning with Online World Models, "Continual Reinforcement Learning by Planning with Online World Models".
Δ-IRIS: "Efficient World Models with Context-Aware Tokenization".
AD3: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors".
R2I: "Mastering Memory Tasks with World Models".
REM: "Improving Token-Based World Models with Parallel Observation Prediction".
AWM, "Do Transformer World Models Give Better Policy Gradients?"".
[⭐️] Dreaming of Many Worlds, "Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization".
PWM: "PWM: Policy Learning with Large World Models".
GenRL: "GenRL: Multimodal foundation world models for generalist embodied agents".
DLLM: "World Models with Hints of Large Language Models for Goal Achieving".
Adaptive World Models: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity".
"Reward-free World Models for Online Imitation Learning".
MoReFree: "World Models Increase Autonomy in Reinforcement Learning".
ROMBRL, "Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning".
"Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning".
[⭐️] MoSim: "Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning".
SENSEI: "SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models".
Spiking World Model, "Implementing Spiking World Model with Multi-Compartment Neurons for Model-based Reinforcement Learning".
DCWM, "Discrete Codebook World Models for Continuous Control".
Multimodal Dreaming: "Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning".
"Generalist World Model Pre-Training for Efficient Reinforcement Learning".
"Learning To Explore With Predictive World Model Via Self-Supervised Learning".
Simulus: "Uncovering Untapped Potential in Sample-Efficient World Model Agents".
DMWM: "DMWM: Dual-Mind World Model with Long-Term Imagination".
EvoAgent: "EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks".
GLIMO: "Grounding Large Language Models In Embodied Environment With Imperfect World Models".
Energy-based Transition Models, "Offline Transition Modeling via Contrastive Energy Learning".
PCM, "Policy-conditioned Environment Models are More Generalizable".

9. World models in other modalities

10. Memory in World Model

Implicit Memory:

Explicit Memory:

World Models in the Language Modality:

World Models in the Pixel Space:

World Models in 3D Mesh Space:

World Models in other modalities:

"Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving".

Physically Plausible World Models:

Newton: "Newton - A Small Benchmark for Interactive Foundation World Models".

Text2World: "Text2World: Benchmarking World Modeling Capabilities of Large Language Models via Program Synthesis".

AetherVision-Bench: "AetherVision-Bench: An Open-Vocabulary RGB-Infrared Benchmark for Multi-Angle Segmentation across Aerial and Ground Perspectives".

VideoPhy-2: "VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation".

A Comprehensive Evaluation: "A Comprehensive Evaluation of Physical Realism in Text-to-Video Models".

ScenePhys: "ScenePhys — Controllable Physics Videos for World-Model Evaluation".

OpenGVL: "OpenGVL - Benchmarking Visual Temporal Progress for Data Curation".

This project is largely built on the foundations laid by:

🕶️ A Survey: Learning Embodied Intelligence from Physical Simulators and World Models
🕶️ Awesome-World-Model-for-Autonomous-Driving
🕶️ Awesome-World-Model-for-Robotics

Huge shoutout the the authors for their awesome work.

If you find this repository useful, please consider citing this list:

@misc{huang2025awesomeworldmodels, title = {Awesome-World-Models}, author = {Siqiao Huang}, journal = {GitHub repository}, url = {https://github.com/knightnemo/Awesome-World-Models}, year = {2025}, }

Read Entire Article