This repo is used for recording, tracking, and benchmarking several recent World Models (for Autonomous Driving or Robotic) methods, as a supplement to our survey.
If you find some ignored papers, feel free to create pull requests, or open issues. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣
If you find this repository useful, please consider giving us a star 🌟 and a cite.
If you find this repository useful in your research, please kindly consider giving a star ⭐ and a citation:
@article{tu2025drivingworldmodel,
title={The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey},
author={Tu, Sifan and Zhou, Xin and Liang, Dingkang and Jiang, Xingyu and Zhang, Yumeng and Li, Xiaofan and Bai, Xiang},
journal={arXiv preprint arXiv:2502.10498},
year={2025}
}
@inproceedings{zhou2025hermes,
title={HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation},
author={Zhou, Xin and Liang, Dingkang and Tu, Sifan and Chen, Xiwu and Ding, Yikang and Zhang, Dingyuan and Tan, Feiyang and Zhao, Hengshuang and Bai, Xiang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}
@inproceedings{liang2025UniFuture,
title={Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception},
author={Liang, Dingkang and Zhang, Dingyuan and Zhou, Xin and Tu, Sifan and Feng, Tianrui and Li, Xiaofan and Zhang, Yumeng and Du, Mingyang and Tan, Xiao and Bai, Xiang},
booktitle={Proceedings of the IEEE International Conference on Robotics Automation},
year={2026}
}CVPR 25 Workshop & Challenge | OpenDriveLabTrack: World Model.
A world model is a computer program that can imagine how the world evolves in response to an agent's behavior. It has the potential to solve general-purpose simulation and evaluation, enabling robots that are safe, reliable, and intelligent in a wide variety of scenarios.
World Model Bench @ CVPR'25WorldModelBench: The 1st Workshop on Benchmarking World Models
World models refer to predictive models of physical phenomena in the world surrounding us. These models are fundamental for Physical AI agents, enabling crucial capabilities such as decision-making, planning, and counterfactual analysis. Effective world models must integrate several key components, including perception, instruction following, controllability, physical plausibility, and future prediction.
CVPR 24 Workshop & Challenge | OpenDriveLabTrack #4: Predictive World Model.CVPR 23 Workshop on Autonomous DrivingCHALLENGE 3: ARGOVERSE CHALLENGES, 3D Occupancy Forecasting using the Argoverse 2 Sensor Dataset. Predict the spacetime occupancy of the world for the next 3 seconds.
- Using Occupancy Grids for Mobile Robot Perception and Navigation [paper]
Yann LeCun: A Path Towards Autonomous Machine Intelligence [paper] [Video]ICCV'25 workshopKeynote - Ashok Elluswamy, Tesla [Video]CVPR'23 workshopKeynote - Ashok Elluswamy, Tesla [Video]WayveIntroducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [blog]World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.
- The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey.
arXiv 25.02[Paper] - Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI.
TMECH 25[Paper] [Code] - A Survey on Future Physical World Generation for Autonomous Driving.
MMAsia 25[Paper] - A survey on multimodal large language models for autonomous driving.
WACVW 24[Paper] [Code] - World Models: The Safety Perspective.
ISSREW[Paper] - Progressive Robustness-Aware World Models in Autonomous Driving: A Review and Outlook.
techrXiv 25.11[Paper] [Project] - A Survey of Unified Multimodal Understanding and Generation: Advances and Challenges.
techrXiv 25.11[Paper] - Simulating the Visual World with Artificial Intelligence: A Roadmap.
arXiv 25.11[Paper] [Project] - A Step Toward World Models: A Survey on Robotic Manipulation.
arXiv 25.11[Paper] - A Comprehensive Survey on World Models for Embodied AI.
arXiv 25.10[Paper] [Project] - The Safety Challenge of World Models for Embodied AI Agents: A Review.
arXiv 25.10[Paper] - A Survey on World Models Grounded in Acoustic Physical Information.
arXiv 25.09[Paper] - From 2D to 3D Cognition: A Brief Survey of General World Models.
arXiv 25.09[Paper] - 3D and 4D World Modeling: A Survey.
arXiv 25.09[Paper] [Code] - A Survey of Embodied World Models.
25.09[Paper] - One Flight Over the Gap: A Survey from Perspective to Panoramic Vision.
arXiv 25.09[Paper] [Page] - Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges.
arXiv 25.08[Paper] - A Survey: Learning Embodied Intelligence from Physical Simulators and World Models.
arXiv 25.07[Paper] - From 2D to 3D Cognition: A Brief Survey of General World Models.
arXiv 25.06[Paper] - World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks.
arXiv 25.05[Paper] - Exploring the Evolution of Physics Cognition in Video Generation: A Survey.
arXiv 25.03[Paper] [Code] - A Survey of World Models for Autonomous Driving.
arXiv 25.01[Paper] - Generative Physical AI in Vision: A Survey.
arXiv 25.01[Paper] [Code] - Understanding World or Predicting Future? A Comprehensive Survey of World Models.
arXiv 24.11[Paper] - Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey.
arXiv 24.11[Paper] - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond.
arXiv 24.5[Paper] [Code] - World Models for Autonomous Driving: An Initial Survey.
arXiv 24.3[Paper]
- [UniFuture] Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception.
ICRA 26[Paper] [Code] [Project] - WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving.
AAAI 26[Paper] - UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving.
arXiv 26.1[Paper] [Project] - MAD: Motion Appearance Decoupling for efficient Driving World Models.
arXiv 26.1[Paper] [Project] - A Mechanistic View on Video Generation as World Models: State and Dynamics.
arXiv 26.1[Paper] - Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving.
arXiv 26.1[Paper] - DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving.
arXiv 26.1[Paper] [Project]
- HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation.
ICCV 25[Paper] [Code] [Project] - [FSDrive] FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving.
NeurIPS 25[Paper] [Code] - DINO-Foresight: Looking into the Future with DINO.
NeurIPS 25[Paper] [Code] - From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction.
NeurIPS 25[Paper] [Code] - InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models.
ICCV 25[Paper] [Project] - DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation.
ICCV 25[Paper] [Project] - Epona: Autoregressive Diffusion World Model for Autonomous Driving.
ICCV 25[Paper] [Code] - UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving.
ICCV 25[Paper] [Code] - DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment.
ACM MM 25[Paper] [Code] - OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving.
ACM MM 25[Paper] - World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model.
ICCV 25[Paper] - [PIWM] Dream to Drive with Predictive Individual World Model.
TIV 25[Paper] [Code] - DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation.
CVPR 25[Paper] [Project Page] - GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction.
CVPR 25[Paper] [Code] - ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration.
CVPR 25[Paper] [Code] - FUTURIST: Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers.
CVPR 25[Paper] [Code] - MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction.
CVPR 25[Paper] [Code] - UniScene: Unified Occupancy-centric Driving Scene Generation.
CVPR 25[Paper] [Project] - DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers.
CVPR 25[Paper] [Project] - GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control.
CVPR 25[Paper] [Project] - [UMGen] Generating Multimodal Driving Scenes via Next-Scene Prediction.
CVPR 25[Paper] [Project] [Code] - DIO: Decomposable Implicit 4D Occupancy-Flow World Model.
CVPR 25[Paper] - SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model.
CVPR 25[Paper] - DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes
ICLR 2025[Paper] [Code] - AdaWM: Adaptive World Model based Planning for Autonomous Driving.
ICLR 2025[Paper] - OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework.
ICLR 2025[Paper] [Code] - [PreWorld] Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving.
ICLR 2025[Paper] [Code] - [SSR] Does End-to-End Autonomous Driving Really Need Perception Tasks?
ICLR 2025[Paper] [Code] - Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models.
ICRA 2025[Paper] - STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation.
IROS 2025[Paper] [Project] - Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models.
IROS 2025[Paper] - Learning to Generate 4D LiDAR Sequences.
ICCVW 2025[Paper] - World model-based end-to-end scene generation for accident anticipation in autonomous driving.
Communications Engineering 2025[Paper] - World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations.
JIFS 2025[Paper] - WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving.
arXiv 2025.12[Paper] [Code] - Latent Chain-of-Thought World Modeling for End-to-End Driving.
arXiv 2025.12[Paper] - GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation.
arXiv 2025.12[Paper] [Project] - WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World.
arXiv 2025.12[Paper] [Project] - UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving.
arXiv 2025.12[Paper] [Project] - MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving.
arXiv 2025.12[Paper] - U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences.
arXiv 2025.12[Paper] - Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles.
arXiv 2025.12[Paper] - Vehicle Dynamics Embedded World Models for Autonomous Driving.
arXiv 2025.12[Paper] - LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving.
arXiv 2025.11[Paper] [Project] - OpenTwinMap: An Open-Source Digital Twin Generator for Urban Autonomous Driving.
arXiv 2025.11[Paper] - SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model.
arXiv 2025.11[Paper] - LaGen: Towards Autoregressive LiDAR Scene Generation.
arXiv 2025.11[Paper] - AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models.
arXiv 2025.11[Paper] - CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving.
arXiv 2025.11[Paper] - [UniScenev2] Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method.
arXiv 2025.10[Paper] - Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models.
arXiv 2025.10[Paper] - SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries.
arXiv 2025.10[Paper] [Code] - OmniNWM: Omniscient Driving Navigation World Models.
arXiv 2025.10[Paper] [Project] - [ORAD-3D] Advancing Off-Road Autonomous Driving: The Large-Scale ORAD-3D Dataset and Comprehensive Benchmarks.
arXiv 2025.10[Paper] [Code] - [Dream4Drive] Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks.
arXiv 2025.10[Paper] [Project] - DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving.
arXiv 2025.10[Paper] - CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving.
arXiv 2025.10[Paper] - CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving.
arXiv 2025.10[Paper] - [PhiGensis] 4D Driving Scene Generation With Stereo Forcing.
arXiv 2025.9[Paper] [Project] - TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving.
arXiv 2025.9[Paper] - OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction.
arXiv 2025.9[Paper] - [G^2Editor] Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation.
arXiv 2025.8[Paper] - LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding.
arXiv 2025.8[Paper] [Project] - Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation.
arXiv 2025.8[Paper] - MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation.
arXiv 2025.8[Paper] - ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving.
arXiv 2025.8[Paper] [Code] - LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences.
arXiv 2025.8[Paper] [Project] - FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models.
arXiv 2025.7[Paper] - World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving.
arXiv 2025.7[Paper] - Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models.
arXiv 2025.7[Paper] [Code] - I2 -World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting.
arXiv 2025.7[Paper] [Code] - NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models.
arXiv 2025.7[Paper] [Code] - Towards foundational LiDAR world models with efficient latent flow matching.
arXiv 2025.6[Paper] - ReSim: Reliable World Simulation for Autonomous Driving.
arXiv 2025.6[Paper] [Project] - Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models.
arXiv 2025.6NVIDIA[Paper] [Project] - Dreamland: Controllable World Creation with Simulator and Generative Models.
arXiv 2025.6[Paper] [Project] - LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model.
arXiv 2025.6[Paper] [Code] - ProphetDWM: ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos.
arXiv 2025.5[Paper] - GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control.
arXiv 2025.5[Paper] [Code] - DIO: Decomposable Implicit 4D Occupancy-Flow World Model.
CVPR 25[Paper] - SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model.
CVPR 25[Paper] - DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes
ICLR 25[Paper] [Code] - AdaWM: Adaptive World Model based Planning for Autonomous Driving.
ICLR 25[Paper] - OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework.
ICLR 25[Paper] [Code] - [PreWorld] Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving.
ICLR 25[Paper] [Code] - [SSR] Does End-to-End Autonomous Driving Really Need Perception Tasks?
ICLR 25[Paper] [Code] - Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models.
ICRA 25[Paper] - STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation.
IROS 25[Paper] [Project] - Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models.
IROS 25[Paper] - Learning to Generate 4D LiDAR Sequences.
ICCVW 25[Paper] - World model-based end-to-end scene generation for accident anticipation in autonomous driving.
Communications Engineering 25[Paper] - World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations.
JIFS 25[Paper] - GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation.
arXiv 25.12[Paper] [Code] - DriveLaW: Unifying Planning and Video Generation in a Latent Driving World.
arXiv 25.12[Paper] - InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement.
arXiv 25.12[Paper] - Latent Chain-of-Thought World Modeling for End-to-End Driving.
arXiv 25.12[Paper] - GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation.
arXiv 25.12[Paper] [Project] - WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World.
arXiv 25.12[Paper] [Project] - UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving.
arXiv 25.12[Paper] [Project] - MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving.
arXiv 25.12[Paper] - U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences.
arXiv 25.12[Paper] - RadarGen: Automotive Radar Point Cloud Generation from Cameras.
arXiv 25.12[Paper] [Project] - Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles.
arXiv 25.12[Paper] - Vehicle Dynamics Embedded World Models for Autonomous Driving.
arXiv 25.12[Paper] - LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving.
arXiv 25.11[Paper] [Project] - OpenTwinMap: An Open-Source Digital Twin Generator for Urban Autonomous Driving.
arXiv 25.11[Paper] - SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model.
arXiv 25.11[Paper] - LaGen: Towards Autoregressive LiDAR Scene Generation.
arXiv 25.11[Paper] - AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models.
arXiv 25.11[Paper] - CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving.
arXiv 25.11[Paper] - [UniScenev2] Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method.
arXiv 25.10[Paper] - Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models.
arXiv 25.10[Paper] - SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries.
arXiv 25.10[Paper] [Code] - OmniNWM: Omniscient Driving Navigation World Models.
arXiv 25.10[Paper] [Project] - [ORAD-3D] Advancing Off-Road Autonomous Driving: The Large-Scale ORAD-3D Dataset and Comprehensive Benchmarks.
arXiv 25.10[Paper] [Code] - [Dream4Drive] Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks.
arXiv 25.10[Paper] [Project] - DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving.
arXiv 25.10[Paper] - CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving.
arXiv 25.10[Paper] - CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving.
arXiv 25.10[Paper] - [PhiGensis] 4D Driving Scene Generation With Stereo Forcing.
arXiv 25.9[Paper] [Project] - TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving.
arXiv 25.9[Paper] - OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction.
arXiv 25.9[Paper] - [G^2Editor] Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation.
arXiv 25.8[Paper] - LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding.
arXiv 25.8[Paper] [Project] - Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation.
arXiv 25.8[Paper] - MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation.
arXiv 25.8[Paper] - ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving.
arXiv 25.8[Paper] [Code] - LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences.
arXiv 25.8[Paper] [Project] - FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models.
arXiv 25.7[Paper] - World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving.
arXiv 25.7[Paper] - Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models.
arXiv 25.7[Paper] [Code] - I2 -World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting.
arXiv 25.7[Paper] [Code] - NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models.
arXiv 25.7[Paper] [Code] - Towards foundational LiDAR world models with efficient latent flow matching.
arXiv 25.6[Paper] - ReSim: Reliable World Simulation for Autonomous Driving.
arXiv 25.6[Paper] [Project] - Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models.
arXiv 25.6NVIDIA[Paper] [Project] - Dreamland: Controllable World Creation with Simulator and Generative Models.
arXiv 25.6[Paper] [Project] - LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model.
arXiv 25.6[Paper] [Code] - ProphetDWM: ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos.
arXiv 25.5[Paper] - GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control.
arXiv 25.5[Paper] [Code] - DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in
Autonomous Driving.
arXiv 25.5[Paper] - VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving.
arXiv 25.5[Paper] [Project] - Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2).
arXiv 25.5[Paper] - [RAMBLE] From Imitation to Exploration: End-to-end Autonomous Driving based on World Model.
arXiv 25.4[Paper] [Code] - DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer.
arXiv 25.4[Paper] - [WoTE] End-to-End Driving with Online Trajectory Evaluation via BEV World Model.
arXiv 25.4[Paper] [Code] - MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control.
arXiv 25.3[Paper] [Project] - CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving.
arXiv 25.3[Paper] - GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving.
arXiv 25.3[Paper] - Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving.
arXiv 25.3[Paper] [Code] - MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving.
arXiv 25.3[Paper] [Project] - SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
arXiv 25.3[Paper] [Code] - [EOT-WM] Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space.
arXiv 25.3[Paper] - [T^3Former] Temporal Triplane Transformers as Occupancy World Models.
arXiv 25.3[Paper] - AVD2: Accident Video Diffusion for Accident Video Description.
arXiv 25.3[Paper] [Project] - VaViM and VaVAM: Autonomous Driving through Video Generative Modeling.
arXiv 25.2[Paper] [Code] - Dream to Drive: Model-Based Vehicle Control Using Analytic World Models.
arXiv 25.2[Paper] - AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data.
arXiv 25.1[Paper] [Code]
- [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model.
TITS[Paper] - Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability.
NeurIPS 24[Paper] [Code] - SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout.
NeurIPS 24[Paper] - DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model.
NeurIPS 24[Paper] [Project] - Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving.
ECCV 24[Paper] - [MARL-CCE] Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model.
ECCV 24[Paper] [Code] - DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving.
ECCV 24[Paper] [Code] - OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving.
ECCV 24[Paper] [Code] - [NeMo] Neural Volumetric World Models for Autonomous Driving.
ECCV 24[Paper] - CarFormer: Self-Driving with Learned Object-Centric Representations.
ECCV 24[Paper] [Code] - [MARL-CCE] Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model.
ECCV 24[Code] - [GUMP] Solving Motion Planning Tasks with a Scalable Generative Model.
ECCV 24[Paper] [Code] - WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation.
ECCV 24[Paper] [Code] - DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model.
ECCV 24[Paper] [Code] - 3D-VLA: A 3D Vision-Language-Action Generative World Model.
ICML 24[Paper] - [ViDAR] Visual Point Cloud Forecasting enables Scalable Autonomous Driving.
CVPR 24[Paper] [Code] - [GenAD] Generalized Predictive Model for Autonomous Driving.
CVPR 24[Paper] [Data] - Cam4DOCC: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications.
CVPR 24[Paper] [Code] - [Drive-WM] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving.
CVPR 24[Paper] [Code] - DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving.
CVPR 24[Paper] - Panacea: Panoramic and Controllable Video Generation for Autonomous Driving.
CVPR 24[Paper] [Code] - UnO: Unsupervised Occupancy Fields for Perception and Forecasting.
CVPR 24[Paper] [Code] - MagicDrive: Street View Generation with Diverse 3D Geometry Control.
ICLR 24[Paper] [Code] - Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion.
ICLR 24[Paper] - SafeDreamer: Safe Reinforcement Learning with World Models.
ICLR 24[Paper] [Code] - DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT.
arXiv 24.12[Paper] [Code] - An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training.
arXiv 24.12[Paper] - Doe-1: Closed-Loop Autonomous Driving with Large World Model.
arXiv 24.12[Paper] [Code] - [DrivePhysica] Physical Informed Driving World Model.
arXiv 24.12[Paper] [Code] - Terra ACT-Bench: Towards Action Controllable World Models for Autonomous Driving.
arXiv 24.12[Paper] [Code] [Project] [Hugging Face] - UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving.
arXiv 24.12[Paper] [Project] [Code] - HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving.
arXiv 24.12[Paper] - InfinityDrive: Breaking Time Limits in Driving World Models.
arXiv 24.12[Paper] [Project Page] - Generating Out-Of-Distribution Scenarios Using Language Models.
arXiv 24.11[Paper] - Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles.
arXiv 24.11[Paper] [Project Page] - WorldSimBench: Towards Video Generation Models as World Simulator.
arXiv 24.10[Paper] [Project Page] - DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model.
arXiv 24.10[Paper] [Project Page] - OCCVAR: Scalable 4D Occupancy Prediction via Next-Scale Prediction.
OpenReview[Paper] - Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models.
arXiv 24.9[Paper] - [LatentDriver] Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving.
arXiv 24.9[Paper] [Code] - RenderWorld: World Model with Self-Supervised 3D Label.
arXiv 24.9[Paper] - OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving.
arXiv 24.9[Paper] - DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving.
arXiv 24.8[Paper] - [Drive-OccWorld] Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving.
arXiv 24.8[Paper] - BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space.
arXiv 24.7[Paper] [Code] - [TOKEN] Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving.
arXiv 24.7[Paper] - UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving.
arXiv 24.6[Paper] - SimGen: Simulator-conditioned Driving Scene Generation.
arXiv 24.6[Paper] [Code] - [AdaptiveDriver] Planning with Adaptive World Models for Autonomous Driving.
arXiv 24.6[Paper] [Code] - [LAW] Enhancing End-to-End Autonomous Driving with Latent World Model.
arXiv 24.6[Paper] [Code] - [Delphi] Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation.
arXiv 24.6[Paper] [Code] - OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving.
arXiv 24.5[Paper] [Code] - MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes.
arXiv 24.5[Paper] [Code] - CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving.
arXiv 24.5[Paper] [Code] - [DriveSim] Probing Multimodal LLMs as World Models for Driving.
arXiv 24.5[Paper] [Code] - LidarDM: Generative LiDAR Simulation in a Generated World.
arXiv 24.4[Paper] [Code] - SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control.
arXiv 24.3[Paper] [Project] - DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation.
arXiv 24.3[Paper] [Code]
- TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction.
ICRA 23[Paper] [Code] - [CTT] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent.
arXiv 23.11[Paper] - MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations.
arXiv 23.11[Paper] - GAIA-1: A Generative World Model for Autonomous Driving.
arXiv 23.9[Paper] - ADriver-I: A General World Model for Autonomous Driving.
arXiv 23.9[Paper] - UniWorld: Autonomous Driving Pre-training via World Models.
arXiv 23.8[Paper] [Code]
- [MILE] Model-Based Imitation Learning for Urban Driving.
NeurIPS 22[Paper] [Code] - Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models.
NeurIPS 22 Spotlight[Paper] [Code] - Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation.
ICRA 22[Paper] - Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving.
IROS 22[Paper] - [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model.
NeurIPS 22 workshop[Paper]
- [EAWM] From Observations to Events: Event-Aware World Model for Reinforcement Learning.
ICLR 26[Paper] [Code] - Value-guided action planning with JEPA world models.
World Modeling Workshop 26[Paper] - Explicit World Models for Reliable Human-Robot Collaboration.
AAAIW 26[Paper] - [Lingbot-World] Advancing Open-source World Models.
arXiv 26.1[Paper] [Code] - [Lingbot-VA] Causal World Modeling for Robot Control.
arXiv 26.1[Paper] [Code] - PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs.
arXiv 26.1[Paper] - WorldBench: Disambiguating Physics for Diagnostic Evaluation of World Modelsl.
arXiv 26.1[Paper] [Project] - Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models.
arXiv 26.1[Paper] [Project] - PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models.
arXiv 26.1[Paper] - Boltzmann-GPT: Bridging Energy-Based World Models and Language Generation.
arXiv 26.1[Paper] - MetaWorld: Skill Transfer and Composition in a Hierarchical World Model for Grounding High-Level Instructions.
arXiv 26.1[Paper] [Project] - Aligning Agentic World Models via Knowledgeable Experience Learning.
arXiv 26.1[Paper] - VJEPA: Variational Joint Embedding Predictive Architectures as Probabilistic World Models.
arXiv 26.1[Paper] - Walk through Paintings: Egocentric World Models from Internet Priors.
arXiv 26.1[Paper] - From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models.
arXiv 26.1[Paper] - An Efficient and Multi-Modal Navigation System with One-Step World Model.
arXiv 26.1[Paper] - ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models.
arXiv 26.1[Paper] - Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning.
arXiv 26.1[Paper] - Inference-time Physics Alignment of Video Generative Models with Latent World Models.
arXiv 26.1[Paper] - Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models.
arXiv 26.1[Paper] - Semantic Belief-State World Model for 3D Human Motion Prediction.
arXiv 26.1[Paper] - PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation.
arXiv 26.1[Paper] [Project] - Current Agents Fail to Leverage World Model as Tool for Foresight.
arXiv 26.1[Paper] - MobileDreamer: Generative Sketch World Model for GUI Agent.
arXiv 26.1[Paper] - Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Testl.
arXiv 26.1[Paper] - VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control.
arXiv 26.1[Paper] [Project] - Learning Latent Action World Models In The Wild.
arXiv 26.1[Paper] - Object-Centric World Models Meet Monte Carlo Tree Search.
arXiv 26.1[Paper] - Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning.
arXiv 26.1[Paper] - Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments.
arXiv 26.1[Paper] - A formal theory on problem space as a semantic world model in systems engineering.
arXiv 26.1[Paper] - Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments.
arXiv 26.1[Paper] [Project] - NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos.
arXiv 26.1[Paper] [Project] - What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
arXiv 26.1[Paper] - AlignUSER: Human-Aligned LLM Agents via World Models for Recommender System Evaluation.
arXiv 26.1[Paper]
- [DreamerV3] Mastering Diverse Domains through world Models.
Nature[Paper] [JAX Code] - 3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation.
AAAI 25[Paper] - Object-Centric World Models for Causality-Aware Reinforcement Learning.
AAAI 25[Paper] - Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds.
NeurIPSW 25[Paper] - Language-conditioned world model improves policy generalization by reading environmental descriptions.
NeurIPSW 25[Paper] - NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments.
ICCV 25[Paper] [Code] - GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
ICCV 25[Paper] [Project] - FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making.
ICML 25[Paper] [Project] - General agents need world models.
ICML 25[Paper] - What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models.
ICML 25[Paper] - Continual Reinforcement Learning by Planning with Online World Models.
ICML 25[Paper] - PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning.
ICML 25[Paper] - [NWM] Navigation World Models.
CVPR 25 Best Paper Honorable MentionYann LeCun[Paper] [Project] - [PrediCIR] Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval.
CVPR 25[Paper] [Code] - [MoSim] Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning.
CVPR 25[Paper] - CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models.
CVPR 25[Paper] [Project] - EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance.
CVPR 25[Paper] [Code] - DiWA: Diffusion Policy Adaptation with World Models.
CoRL 25[Paper] [Project] - Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning.
SIGIR 25[Paper] - LS-Imagine: Open-World Reinforcement Learning over Long Short-Term Imagination.
ICLR 25 Oral[Paper] [Code] - DC-MPC: Discrete Codebook World Models for Continuous Control.
ICLR 25[Paper] [Code] - [SGF] Simple, Good, Fast: Self-Supervised World Models Free of Baggage.
ICLR 25[Paper] [Code] - ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model.
IROS 25[Paper] [Code] - SCMA: Self-Consistent Model-based Adaptation for Visual Reinforcement Learning.
IJCAI 25[Paper] - Surfer: A World Model-Based Framework for Vision-Language Robot Manipulation.
TNNLS 25[Paper] - Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling.
World Modeling Workshop 26[Paper] [Code] - On Memory: A comparison of memory mechanisms in world models.
World Modeling Workshop 26[Paper] - Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation.
ICRAW 25[Paper] - Act2Goal: From World Model To General Goal-conditioned Policy.
arXiv 25.12[Paper] - Web World Models.
arXiv 25.12[Paper] - [LEWM] Large Emotional World Model.
arXiv 25.12[Paper] - World model inspired sarcasm reasoning with large language model agents.
arXiv 25.12[Paper] - TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model.
arXiv 25.12[Paper] - Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space.
arXiv 25.12[Paper] - Yume-1.5: A Text-Controlled Interactive World Generation Model.
arXiv 25.12[Paper] - [ORCA] Active Intelligence in Video Avatars via Closed-loop World Modeling.
arXiv 25.12[Paper] [Project] - From Word to World: Can Large Language Models be Implicit Text-based World Models?.
arXiv 25.12[Paper] - A Unified Definition of Hallucination, Or: It's the World Model, Stupid.
arXiv 25.12[Paper] - AstraNav-World: World Model for Foresight Control and Consistency.
arXiv 25.12[Paper] - ChronoDreamer: Action-Conditioned World Model as an Online Simulator for Robotic Planning.
arXiv 25.12[Paper] - STORM: Search-Guided Generative World Models for Robotic Manipulation.
arXiv 25.12[Paper] - Dexterous World Models.
arXiv 25.12[Paper] [Project] - On Memory: A comparison of memory mechanisms in world models.
arXiv 25.12[Paper] - WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling.
arXiv 25.12[Paper] [Project] - Motus: A Unified Latent Action World Model.
arXiv 25.12[Paper] - LongVie 2: Multimodal Controllable Ultra-Long Video World Model.
arXiv 25.12[Paper] - World Models Can Leverage Human Videos for Dexterous Manipulation.
arXiv 25.12[Paper] - World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents.
arXiv 25.12[Paper] - VFMF: World Modeling by Forecasting Vision Foundation Model Features.
arXiv 25.12[Paper] [Code] - VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation.
arXiv 25.12[Paper] [Project] - The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces.
arXiv 25.12[Paper] - KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models.
arXiv 25.12[Paper] - CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space.
arXiv 25.12[Paper] - Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model.
arXiv 25.12[Paper] [Project] - Deterministic World Models for Verification of Closed-loop Vision-based Systems.
arXiv 25.12[Paper] - Closing the Train-Test Gap in World Models for Gradient-Based Planning.
arXiv 25.12[Paper] - Latent Action World Models for Control with Unlabeled Trajectories.
arXiv 25.12[Paper] - Evaluating Gemini Robotics Policies in a Veo World Simulator.
arXiv 25.12[Paper] - Astra: General Interactive World Model with Autoregressive Denoising.
arXiv 25.12[Paper] [Code] - Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform.
arXiv 25.12[Paper] -
- Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems.
arXiv 25.12[Paper]
- Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems.
- Learning Robot Manipulation from Audio World Models.
arXiv 25.12[Paper] - FieldSeer I: Physics-Guided World Models for Long-Horizon Electromagnetic Dynamics under Partial Observability.
arXiv 25.12[Paper] - World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty.
arXiv 25.12[Paper] - Speech World Model: Causal State-Action Planning with Explicit Reasoning for Speech.
arXiv 25.12[Paper] - BiTAgent: A Task-Aware Modular Framework for Bidirectional Coupling between Multimodal Large Language Models and World Models.
arXiv 25.12[Paper] - AdaPower: Specializing World Foundation Models for Predictive Manipulation.
arXiv 25.12[Paper] - RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL.
arXiv 25.12[Paper] - RELIC: Interactive Video World Model with Long-Horizon Memory.
arXiv 25.12[Paper] - Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound.
arXiv 25.12[Paper] - Better World Models Can Lead to Better Post-Training Performance.
arXiv 25.12[Paper] - VCWorld: A Biological World Model for Virtual Cell Simulation.
arXiv 25.12[Paper] - NavForesee: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction.
arXiv 25.12[Paper] - GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment.
arXiv 25.12[Paper] - The brain-AI convergence: Predictive and generative world models for general-purpose computation.
arXiv 25.12[Paper] - WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling.
arXiv 25.12[Paper] - VISTAv2: World Imagination for Indoor Vision-and-Language Navigation.
arXiv 25.11[Paper] - Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model.
arXiv 25.11[Paper] [Project] - SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments.
arXiv 25.11[Paper] - Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction.
arXiv 25.11[Paper] - TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos.
arXiv 25.11[Paper] - GigaWorld-0: World Models as Data Engine to Empower Embodied AI.
arXiv 25.11[Paper] - 4DWorldBench: A Comprehensive Evaluation Framework for 3D/4D World Generation Models.
arXiv 25.11[Paper] - Thinking Ahead: Foresight Intelligence in MLLMs and World Models.
arXiv 25.11[Paper] - Counterfactual World Models via Digital Twin-conditioned Video Diffusion.
arXiv 25.11[Paper] - RynnVLA-002: A Unified Vision-Language-Action and World Model.
arXiv 25.11[Paper] - Beyond Generative AI: World Models for Clinical Prediction, Counterfactuals, and Planning.
arXiv 25.11[Paper] - X-WIN: Building Chest Radiograph World Model via Predictive Sensing.
arXiv 25.11[Paper] - IPR-1: Interactive Physical Reasoner.
arXiv 25.11[Paper] - NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards.
arXiv 25.11[Paper] [Code] - Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos.
arXiv 25.11[Paper] - PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics.
arXiv 25.11[Paper] - Latent-Space Autoregressive World Model for Efficient and Robust Image-Goal Navigation.
arXiv 25.11[Paper] - Scalable Policy Evaluation with Video World Models.
arXiv 25.11[Paper] - WMPO: World Model-based Policy Optimization for Vision-Language-Action Models.
arXiv 25.11[Paper] - ViPRA: Video Prediction for Robot Actions.
arXiv 25.11[Paper] - Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks.
arXiv 25.11[Paper] - LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems.
arXiv 25.11[Paper] - DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration.
arXiv 25.11[Paper] - WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models.
arXiv 25.11[Paper] - Natural Building Blocks for Structured World Models: Theory, Evidence, and Scaling.
arXiv 25.11[Paper] - Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning.
arXiv 25.11[Paper] - How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment.
arXiv 25.11[Paper] - Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model.
arXiv 25.11[Paper] - Co-Evolving Latent Action World Models.
arXiv 25.10[Paper] - Emu3.5: Native Multimodal Models are World Learners.
arXiv 25.10[Paper] - Clone Deterministic 3D Worlds with Geometrically-Regularized World Models.
arXiv 25.10[Paper] - Semantic Communications with World Models.
arXiv 25.10[Paper] - Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks.
arXiv 25.10[Paper] - Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model.
arXiv 25.10[Paper] - Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation.
arXiv 25.10[Paper] - Vector Quantization in the Brain: Grid-like Codes in World Models.
arXiv 25.10[Paper] - Zero-shot World Models via Search in Memory.
arXiv 25.10[Paper] - VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents.
arXiv 25.10[Paper] [Project] - World-in-World: World Models in a Closed-Loop World.
arXiv 25.10[Paper] [Code] - Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task.
arXiv 25.10[Paper] - Social World Model-Augmented Mechanism Design Policy Learning.
arXiv 25.10[Paper] - ProTerrain: Probabilistic Physics-Informed Rough Terrain World Modeling.
arXiv 25.10[Paper] - GigaBrain-0: A World Model-Powered Vision-Language-Action Model.
arXiv 25.10[Paper] [Project] - Benchmarking World-Model Learning.
arXiv 25.10[Paper] - Semantic World Models.
arXiv 25.10[Paper] [Project] - World Models Should Prioritize the Unification of Physical and Social Dynamics.
arXiv 25.10[Paper] - From Masks to Worlds: A Hitchhiker's Guide to World Models.
arXiv 25.10[Paper] - Rethinking the Simulation vs. Rendering Dichotomy: No Free Lunch in Spatial World Modelling.
NeurIPSW 25[Paper] - How Hard is it to Confuse a World Model?
arXiv 25.10[Paper] - DreamerV3-XP: Optimizing exploration through uncertainty estimation.
arXiv 25.10[Paper] - PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis.
arXiv 25.10[Paper] - Terra: Explorable Native 3D World Model with Point Latents.
arXiv 25.10[Paper] [Project] - R-WoM: Retrieval-augmented World Model For Computer-use Agents.
arXiv 25.10[Paper] - One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration.
arXiv 25.10[Paper] [Project] - Deep SPI: Safe Policy Improvement via World Models.
arXiv 25.10[Paper] - DREAMer-VXS: A Latent World Model for Sample-Efficient AGV Exploration in Stochastic, Unobserved Environments.
arXiv 25.10[Paper] - Ego-Vision World Model for Humanoid Contact Planning.
arXiv 25.10[Paper] [Project] - Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation.
arXiv 25.10[Paper] - What You Don't Know Can Hurt You: How Well do Latent Safety Filters Understand Partially Observable Safety Constraints?
arXiv 25.10[Paper] - Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report.
arXiv 25.10[Paper] - WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation.
arXiv 25.10[Paper] - Ctrl-World: A Controllable Generative World Model for Robot Manipulation.
arXiv 25.10[Paper] - Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning.
arXiv 25.10[Paper] - VideoVerse: How Far is Your T2V Generator from a World Model?
arXiv 25.10[Paper] - Internal World Models as Imagination Networks in Cognitive Agents.
arXiv 25.10[Paper] - Code World Models for General Game Playing.
arXiv 25.10[Paper] - Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models.
arXiv 25.10[Paper] - MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator.
arXiv 25.10[Paper] [Code] - Bridging the Gap Between Multimodal Foundation Models and World Models.
arXiv 25.10[Paper] - Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft.
arXiv 25.10[Paper] [Project] - A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models.
arXiv 25.10[Paper] - CWM: An Open-Weights LLM for Research on Code Generation with World Models.
arXiv 25.9[Paper] - FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction.
arXiv 25.9[Paper] - LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE.
arXiv 25.9[Paper] [Code] - LongLive: Real-time Interactive Long Video Generation.
arXiv 25.9[Paper] [Code] - MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation.
arXiv 25.9[Paper] - Context and Diversity Matter: The Emergence of In-Context Learning in World Models.
arXiv 25.9[Paper] - WoW: Towards a World omniscient World model Through Embodied Interaction.
arXiv 25.9[Paper] - KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models.
arXiv 25.9[Paper] [Code] - [Voe 3] Video models are zero-shot learners and reasoners.
arXiv 25.9[Paper] [Project] - World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation.
arXiv 25.9[Paper] [Project] - Remote Sensing-Oriented World Model.
arXiv 25.9[Paper] - SAMPO: Scale-wise Autoregression with Motion PrOmpt for generative world models.
arXiv 25.9[Paper] - [PIWM] Enhancing Physical Consistency in Lightweight World Models.
arXiv 25.9[Paper] [Project] - LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures.
arXiv 25.9[Paper] [Code] - PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models.
arXiv 25.9[Paper] - OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling.
arXiv 25.9[Paper] [Project] - UnifoLM-WMA-0: A World-Model-Action (WMA) Framework under UnifoLM Family.
Unitree[Code] - One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning.
arXiv 25.9[Paper] - Language-Driven Hierarchical Task Structures as Explicit World Models for Multi-Agent Learning.
arXiv 25.9[Paper] - LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation.
arXiv 25.9[Paper] [Demo] - Design and Optimization of Reinforcement Learning-Based Agents in Text-Based Games.
arXiv 25.9[Paper] - CausalARC: Abstract Reasoning with Causal World Models.
arXiv 25.9[Paper] - Planning with Reasoning using Vision Language World Model.
arXiv 25.9[Paper] - Learning an Adversarial World Model for Automated Curriculum Generation in MARL.
arXiv 25.9[Paper] - World Model Implanting for Test-time Adaptation of Embodied Agents.
arXiv 25.9[Paper] - Social World Models.
arXiv 25.8[Paper] - [PEWM] Learning Primitive Embodied World Models: Towards Scalable Robotic Learning.
arXiv 25.8[Paper] - [DALI] Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization.
arXiv 25.8[Paper] - HERO: Hierarchical Extrapolation and Refresh for Efficient World Models.
arXiv 25.8[Paper] - Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model.
arXiv 25.8[Paper] [Code] - Visuomotor Grasping with World Models for Surgical Robots.
arXiv 25.8[Paper] - Genie 3: A new frontier for world models.
Google DeepMind[Blog] - SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model.
arXiv 25.7[Paper] - CoEx -- Co-evolving World-model and Exploration.
arXiv 25.7[Paper] - What Does it Mean for a Neural Network to Learn a "World Model"?
arXiv 25.7[Paper] - Back to the Features: DINO as a Foundation for Video World Models.
arXiv 25.7[Paper] - HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels.
25.7[Paper] [Code] - Yume: An Interactive World Generation Model.
arXiv 25.7[Paper] [Code] - LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning.
arXiv 25.7[Paper] - MindJourney: Test-Time Scaling with World Models for Spatial Reasoning.
arXiv 25.7[Paper] [Project] - Latent Policy Steering with Embodiment-Agnostic Pretrained World Models.
arXiv 25.7[Paper] - MobiWorld: World Models for Mobile Wireless Network.
arXiv 25.7[Paper] - [GWM] Graph World Model.
arXiv 25.7[Paper] [Code] - From Curiosity to Competence: How World Models Interact with the Dynamics of Exploration.
arXiv 25.7[Paper] - Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions.
arXiv 25.7[Paper] [Project] - Sekai: A Video Dataset towards World Exploration.
arXiv 25.7[Paper] [Project] - Dyn-O: Building Structured World Models with Object-Centric Representations.
arXiv 25.7[Paper] - Critiques of World Models.
arXiv 25.7[Paper] - [PEVA] Whole-Body Conditioned Egocentric Video Prediction.
arXiv 25.6[Paper] [Project] - World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation.
arXiv 25.6[Paper] [Project] - ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation.
arXiv 25.6[Paper] [Project] - RoboScape: Physics-informed Embodied World Model.
arXiv 25.6[Paper] [Code] - Embodied AI Agents: Modeling the World.
arXiv 25.6[Paper] - A "Good" Regulator May Provide a World Model for Intelligent Systems.
arXiv 25.6[Paper] - WorldVLA: Towards Autoregressive Action World Model.
arXiv 25.6[Paper] [Code] - MinD: Unified Visual Imagination and Control via Hierarchical World Models.
arXiv 25.6[Paper] - Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning.
arXiv 25.6[Paper] - [UNIVERSE] Adapting Vision-Language Models for Evaluating World Models.
arXiv 25.6[Paper] - TransDreamerV3: Implanting Transformer In DreamerV3.
arXiv 25.6[Paper] - Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control.
arXiv 25.6[Paper] - Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework.
arXiv 25.6[Paper] - GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation.
arXiv 25.6[Paper] [Project] - [UniVLA] Unified Vision-Language-Action Model.
arXiv 25.6[Paper] - Xray2Xray: World Model from Chest X-rays with Volumetric Context.
arXiv 25.6[Paper] - PlayerOne: Egocentric World Simulator.
arXiv 25.6[Paper] [Project] - V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning.
arXiv 25.6Yann LeCun[Paper] [Project] - [TAWM] Time-Aware World Model for Adaptive Prediction and Control.
arXiv 25.6[Paper] [Code] - [XPM-WM] Efficient Generation of Diverse Cooperative Agents with World Models.
arXiv 25.6[Paper] - Video World Models with Long-term Spatial Memory.
arXiv 25.6[Paper] [Project] - DSG-World: Learning a 3D Gaussian World Model from Dual State Videos.
arXiv 25.6[Paper] - Safe Planning and Policy Optimization via World Model Learning.
arXiv 25.6[Paper] - 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model.
arXiv 25.6[Paper] [Code] - Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models.
arXiv 25.6[Paper] - ORV: 4D Occupancy-centric Robot Video Generation.
arXiv 25.6[Paper] [Project] - DeepVerse: 4D Autoregressive Video Generation as a World Model.
arXiv 25.6[Paper] [Project] - Sparse Imagination for Efficient Visual World Model Planning.
arXiv 25.6[Paper] - Learning Abstract World Models with a Group-Structured Latent Space.
arXiv 25.6[Paper] - Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation.
arXiv 25.6[Paper] [Project] - EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models.
arXiv 25.5[Paper] [Code] - WoMAP: World Models For Embodied Open-Vocabulary Object Localization.
arXiv 25.6[Paper] - [LoopNav] Toward Memory-Aided World Models: Benchmarking via Spatial Consistency.
arXiv 25.5[Paper] [Code] [Data] - Long-Context State-Space Video World Models.
arXiv 25.5[Paper] - Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents.
arXiv 25.5[Paper] - [WPE] Evaluating Robot Policies in a World Model.
arXiv 25.5[Paper] [Demo] - StateSpaceDiffuser: Bringing Long Context to Diffusion World Models.
arXiv 25.5[Paper] - [VRAG] Learning World Models for Interactive Video Generation.
arXiv 25.5[Paper] - JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning.
arXiv 25.5[Paper] - [FPWC] Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach.
arXiv 25.5[Paper] - [ForeDiff] Consistent World Models via Foresight Diffusion.
arXiv 25.5[Paper] - FLARE: Robot Learning with Implicit World Modeling.
arXiv 25.5[Paper] [Project] - [RWM] World Models as Reference Trajectories for Rapid Motor Adaptation.
arXiv 25.5[Paper] - RLVR-World: Training World Models with Reinforcement Learning.
arXiv 25.5[Paper] [Project] - Vid2World: Crafting Video Diffusion Models to Interactive World Models.
arXiv 25.5[Paper] [Project] - Causal Cartographer: From Mapping to Reasoning Over Counterfactual Worlds.
arXiv 25.5[Paper] - EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models.
arXiv 25.5[Paper] [Data&Code] - FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation.
arXiv 25.5[Paper] [Project] - [RoboOccWorld] Occupancy World Model for Robots.
arXiv 25.5[Paper] - seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models.
arXiv 25.5[Paper] - TesserAct: Learning 4D Embodied World Models.
arXiv 25.4[Paper] [Project] - ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance.
arXiv 25.4[Paper] - [RWM-O] Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator.
arXiv 25.4[Paper] - PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation.
arXiv 25.4[Paper] - Adapting a World Model for Trajectory Following in a 3D Game.
arXiv 25.4[Paper] - Embodied World Models Emerge from Navigational Task in Open-Ended Environments.
arXiv 25.4[Paper] - MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft.
arXiv 25.4[Paper] [Code] - [UWM] Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets.
arXiv 25.4[Paper] [Code] - Synthesizing world models for bilevel planning.
arXiv 25.3[Paper] - Aether: Geometric-Aware Unified World Modeling.
arXiv 25.3[Paper] [Project] - [MaaG] Model as a Game: On Numerical and Spatial Consistency for Generative Games.
arXiv 25.3[Paper] - DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation.
arXiv 25.3[Paper] [Project] - Cosmos-Transfer1
arXiv 25.3[Paper] [Code] - Meta-Reinforcement Learning with Discrete World Models for Adaptive Load Balancing.
ACMSE 25[Paper] - [FAR] Long-Context Autoregressive Video Modeling with Next-Frame Prediction.
arXiv 25.3[Paper] [Project] [Code] - LUMOS: Language-Conditioned Imitation Learning with World Models.
arXiv 25.3[Paper] [Project] - World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning.
arXiv 25.3[Paper] - [WLA] Inter-environmental world modeling for continuous and compositional dynamics.
arXiv 25.3[Paper] - Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning.
arXiv 25.3[Paper] - WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation.
arXiv 25.3[Paper] [Code] - Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments.
arXiv 25.3[Paper] - WorldModelBench: Judging Video Generation Models As World Models.
arXiv 25.2[Paper] [Project] - Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning.
arXiv 25.2[Paper] - Learning To Explore With Predictive World Model Via Self-Supervised Learning.
arXiv 25.2[Paper] - Text2World: Benchmarking Large Language Models for Symbolic World Model Generation.
arXiv 25.2[Paper] [Project] - M^3 : A Modular World Model over Streams of Tokens.
arXiv 25.2[Paper] [Code] - When do Neural Networks Learn World Models?.
arXiv 25.2[Paper] - [DWS] Pre-Trained Video Generative Models as World Simulators.
arXiv 25.2[Paper] - DMWM: Dual-Mind World Model with Long-Term Imagination.
arXiv 25.2[Paper] - EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks.
arXiv 25.2[Paper] - Generating Symbolic World Models via Test-time Scaling of Large Language Models.
arXiv 25.2[Paper] - [HMA] Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression.
arXiv 25.2[Paper] [Code] [Project] - UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent.
arXiv 25.1[Paper] - GLAM: Global-Local Variation Awareness in Mamba-based World Model.
arXiv 25.1[Paper] [Code] - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics.
arXiv 25.1[Paper] - GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning.
arXiv 25.1[Paper] - RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation.
arXiv 25.1[Paper] - EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation.
AgiBotarXiv 25.1[Paper] [Website] - Cosmos World Foundation Model Platform for Physical AI.
NVIDIAarXiv 25.1[Paper] [Code]
- [SMAC] Grounded Answers for Multi-agent Decision-making Problem through Generative World Model.
NeurIPS 24[Paper] - [CoWorld] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning.
NeurIPS 24[Paper] [Website] [Torch Code] - [Diamond] Diffusion for World Modeling: Visual Details Matter in Atari.
NeurIPS 24[Paper] [Code] - PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation.
NeurIPS 24[Paper] - [MUN]Learning World Models for Unconstrained Goal Navigation.
NeurIPS 24[Paper] [Code] - VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation.
NeurIPS 24[Paper] - Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity.
NeurIPSW 24[Paper] - Emergence of Implicit World Models from Mortal Agents.
NeurIPSW 24[Paper] - Causal World Representation in the GPT Model.
NeurIPSW 24[Paper] - PreLAR: World Model Pre-training with Learnable Action Representation.
ECCV 24[Paper] [Code] - [CWM] Understanding Physical Dynamics with Counterfactual World Modeling.
ECCV 24[Paper] [Code] - ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation.
ECCV 24[Paper] [Code] - [DWL] Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning.
RSS 24 (Best Paper Award Finalist)[Paper] - [LLM-Sim] Can Language Models Serve as Text-Based World Simulators?
ACL[Paper] [Code] - RoboDreamer: Learning Compositional World Models for Robot Imagination.
ICML 24[Paper] [Code] - [Δ-IRIS] Efficient World Models with Context-Aware Tokenization.
ICML 24[Paper] [Code] - AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors.
ICML 24[Paper] - Hieros: Hierarchical Imagination on Structured State Space Sequence World Models.
ICML 24[Paper] - [HRSSM] Learning Latent Dynamic Robust Representations for World Models.
ICML 24[Paper] [Code] - HarmonyDream: Task Harmonization Inside World Models.
ICML 24[Paper] [Code] - [REM] Improving Token-Based World Models with Parallel Observation Prediction.
ICML 24[Paper] [Code] - Do Transformer World Models Give Better Policy Gradients?
ICML 24[Paper] - TD-MPC2: Scalable, Robust World Models for Continuous Control.
ICLR 24[Paper] [Torch Code] - DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing.
ICLR 24[Paper] - [R2I] Mastering Memory Tasks with World Models.
ICLR 24[Paper] [JAX Code] - MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning.
ICLR 24[Paper] [Code] - Multi-Task Interactive Robot Fleet Learning with Visual World Models.
CoRL 24[Paper] [Code] - Generative Emergent Communication: Large Language Model is a Collective World Model.
arXiv 24.12[Paper] - Towards Unraveling and Improving Generalization in World Models.
arXiv 24.12[Paper] - Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction.
arXiv 24.12[Paper] - Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination.
arXiv 24.12[Paper] [Project] - Transformers Use Causal World Models in Maze-Solving Tasks.
arXiv 24.12[Paper] - Owl-1: Omni World Model for Consistent Long Video Generation.
arXiv 24.12[Paper] [Code] - StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization.
arXiv 24.12[Paper] [Code] - SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation.
BNAIC 24[Paper] - Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm.
arXiv 24.12[Paper] - Genie 2: A large-scale foundation world model.
24.12Google DeepMind[Blog] - The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control.
arXiv 24.12[Paper] [Project] - Motion Prompting: Controlling Video Generation with Motion Trajectories.
arXiv 24.12[Paper] [Project] - Generative World Explorer.
arXiv 24.11[Paper] [Project] - [WebDreamer] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents.
arXiv 24.11[Paper] [Code] - WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making.
arXiv 24.11[Paper] - DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning.
arXiv 24.11Yann LeCun[Paper] - Scaling Laws for Pre-training Agents and World Models.
arXiv 24.11[Paper] - [Phyworld] How Far is Video Generation from World Model: A Physical Law Perspective.
arXiv 24.11[Paper] [Project] - IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI.
arXiv 24.10[Paper] [Project] - EVA: An Embodied World Model for Future Video Anticipation.
arXiv 24.10[Paper] - VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning.
arXiv 24.10[Paper] - [LLMCWM] Language Agents Meet Causality -- Bridging LLMs and Causal World Models.
arXiv 24.10[Paper] [Code] - Reward-free World Models for Online Imitation Learning.
arXiv 24.10[Paper] - Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation.
arXiv 24.10[Paper] - [GLIMO] Grounding Large Language Models In Embodied Environment With Imperfect World Models.
arXiv 24.10[Paper] - AVID: Adapting Video Diffusion Models to World Models.
arXiv 24.10[Paper] [Code] - [WMP] World Model-based Perception for Visual Legged Locomotion.
arXiv 24.9[Paper] [Project] - [OSWM] One-shot World Models Using a Transformer Trained on a Synthetic Prior.
arXiv 24.9[Paper] - R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models.
arXiv 24.9[Paper] - Representing Positional Information in Generative World Models for Object Manipulation.
arXiv 24.9[Paper] - Making Large Language Models into World Models with Precondition and Effect Knowledge.
arXiv 24.9[Paper] - DexSim2Real$^2$: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation.
arXiv 24.9[Paper] - Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction.
arXiv 24.8[Paper] - [MoReFree] World Models Increase Autonomy in Reinforcement Learning.
arXiv 24.8[Paper] [Project] - UrbanWorld: An Urban World Model for 3D City Generation.
arXiv 24.7[Paper] - PWM: Policy Learning with Large World Models.
arXiv 24.7[Paper] [Code] - Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling.
arXiv 24.7[Paper] - [GenRL] Multimodal foundation world models for generalist embodied agents.
arXiv 24.6[Paper] [Code] - [DLLM] World Models with Hints of Large Language Models for Goal Achieving.
arXiv 24.6[Paper] - Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model.
arXiv 24.6[Paper] - CityBench: Evaluating the Capabilities of Large Language Model as World Model.
arXiv 24.6[Paper] [Code] - CoDreamer: Communication-Based Decentralised World Models.
arXiv 24.6[Paper] - [EBWM] Cognitively Inspired Energy-Based World Models.
arXiv 24.6[Paper] - Evaluating the World Model Implicit in a Generative Model.
arXiv 24.6[Paper] [Code] - Transformers and Slot Encoding for Sample Efficient Physical World Modelling.
arXiv 24.5[Paper] [Code] - [Puppeteer] Hierarchical World Models as Visual Whole-Body Humanoid Controllers.
arXiv 24.5Yann LeCun[Paper] [Code] - BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation.
arXiv 24.5[Paper] - Pandora: Towards General World Model with Natural Language Actions and Video States. [Paper] [Code]
- [WKM] Agent Planning with World Knowledge Model.
arXiv 24.5[Paper] [Code] - Newton™ – a first-of-its-kind foundation model for understanding the physical world.
Archetype AI[Blog] - Compete and Compose: Learning Independent Mechanisms for Modular World Models.
arXiv 24.4[Paper] - MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators.
arXiv 24.4[Paper] [Code] - Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization.
arXiv 24.3[Paper] [Code] - ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation.
arXiv 24.3[Paper] [Code] - V-JEPA: Video Joint Embedding Predictive Architecture.
Meta AIYann LeCun[Blog] [Paper] [Code] - [IWM] Learning and Leveraging World Models in Visual Representation Learning.
Meta AI[Paper] - Genie: Generative Interactive Environments.
DeepMind[Paper] [Blog] - [Sora] Video generation models as world simulators.
OpenAI[Technical report] - [LWM] World Model on Million-Length Video And Language With RingAttention.
arXiv 24.2[Paper] [Code] - Planning with an Ensemble of World Models.
OpenReview[Paper] - WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens.
arXiv 24.1[Paper] [Code]
- [IRIS] Transformers are Sample Efficient World Models.
ICLR 23 Oral[Paper] [Torch Code] - STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning.
NIPS 23[Paper] [Torch Code] - [TWM] Transformer-based World Models Are Happy with 100k Interactions.
ICLR 23[Paper] [Torch Code] - FOCUS: Object-Centric World Models for Robotics Manipulation
arXiv 23.7[Paper] [Code] - [Dynalang] Learning to Model the World with Language.
arXiv 23.8[Paper] [Code] - [TAD] Task Aware Dreamer for Task Generalization in Reinforcement Learning.
arXiv 23.3[Paper] - [DreamerV3] Mastering Atari with Discrete World Models.
arXiv 23.1[Paper] [Code] [Torch Code]
- [TD-MPC] Temporal Difference Learning for Model Predictive Control.
ICML 22[Paper][Code] - DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations.
ICML 22[Paper] [Code] - DayDreamer: World Models for Physical Robot Learning.
CoRL 22[Paper] [Code] - Deep Hierarchical Planning from Pixels.
NIPS 22[Paper] [Code] - Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models.
NIPS 22 Spotlight[Paper] [Code] - DreamingV2: Reinforcement Learning with Discrete World Models without Reconstruction.
arXiv 22.3[Paper]
- [DreamerV2] Mastering Atari with Discrete World Models.
ICLR 21[Paper] [TF Code] [Torch Code] - Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction.
ICRA 21[Paper]
- [DreamerV1] Dream to Control: Learning Behaviors by Latent Imagination.
ICLR 20[Paper] [TF Code] [Torch Code] - [Plan2Explore] Planning to Explore via Self-Supervised World Models.
ICML 20[Paper] [TF Code] [Torch Code]
- World Models.
NIPS 2018 Oral[Paper]