Skip to content

tsinghua-fib-lab/World-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-World-Model Awesome

A curated list of awesome resources on World Models, based on the comprehensive survey "Understanding World or Predicting Future? A Comprehensive Survey of World Models".

Loading Outline

News🔥

  • [2024/11/21] Initial release of our survey is available on arXiv.
  • [2025/06/13] Our survey paper "Understanding World or Predicting Future? A Comprehensive Survey of World Models" has been accepted by ACM Computing Surveys.
  • [2025/06/25] Second version of our survey is available on arXiv.
  • [2025/07/18] Initial release of the Awesome-World-Model GitHub repository.

Contact

If you have any suggestions or find our work helpful, feel free to contact us
Email: [email protected]

If this list helps your research, please ⭐ and cite:

@article{ding2025worldmodels,
  title={Understanding World or Predicting Future? A Comprehensive Survey of World Models},
  author={Ding, Jingtao and Zhang, Yunke and Shang, Yu and Zhang, Yuheng and Zong, Zefang and Feng, Jie and Yuan, Yuan and Su, Hongyu and Li, Nian and Sukiennik, Nicholas and Xu, Fengli and Li, Yong},
  journal={ACM Computing Surveys},
  year={2025}
}

Table of Contents 🍃

1 Introduction & 2 Background

Title Pub. & Date Code/Project URL
Sora: Creating video from text OpenAI 2024
A path towards autonomous machine intelligence version 0.9.2, 2022-06-27 Open Review 2022
A framework for representing knowledge 1974
Recurrent world models facilitate policy evolution 2018 Website
World models 2018 Website
Mental models: Towards a cognitive science of language, inference, and consciousness
Sora as an agi world model? a complete survey on text-to-video generation arXiv 2024
Is sora a world simulator? a comprehensive survey on general world models and beyond arXiv 2024
World models for autonomous driving: An initial survey IEEE T-IV 2024
Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies arXiv 2024
Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities arXiv 2024
From Efficient Multimodal Models to World Models: A Survey arXiv 2024
MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving arXiv 2018
YOLOP: You Only Look Once for Panoptic Driving Perception Image and Vision Computing 2022 Star
Scene transformer: A unified multi-task model for behavior prediction and planning
Motion transformer with global intention localization and local movement refinement NeurIPS 2022 Star
Query-centric trajectory prediction CVPR 2023
Gnm: A general navigation model to drive any robot IEEE Trans. on Robotics 2023 Website
Repvit: Revisiting mobile cnn from vit perspective CVPR 2024 Star
Learning latent dynamics for planning from pixels ICML 2019 Website
S3: Social-network Simulation System with Large Language Model-Empowered Agents arXiv 2023
Generative agents: Interactive simulacra of human behavior UIST 2023 Star

3 Implicit Representation of the External World

3.1 World Model in Decision Making

Title Pub. & Date Code/Project URL
Deep reinforcement learning in a handful of trials using probabilistic dynamics models NeurIPS 2018 Star
PWM: Policy Learning with Multi-Task World Models OpenReview Website
Recurrent world models facilitate policy evolution NeurIPS 2018 Website
Dream to control: Learning behaviors by latent imagination arXiv 2019 Website
Leveraging pre-trained large language models to construct and utilize world models for model-based task planning NeurIPS 2023 Star
Mastering atari with discrete world models arXiv 2020 Website
Mastering diverse control tasks through world models Nature 2024 Website
TD-MPC2: Scalable, Robust World Models for Continuous Control OpenReview Website
When to trust your model: Model-based policy optimization NeurIPS 2019
Offline reinforcement learning as one big sequence modeling problem NeurIPS 2021 Star
Model predictive control Springer
Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees arXiv 2018
Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning IEEE 2018
A game theoretic framework for model based reinforcement learning PMLR 2020 Website
General agents need world models arXiv 2025
Mastering memory tasks with world models arXiv 2024
A generalist dynamics model for control arXiv 2023
Exploring model-based planning with policy networks arXiv 2019
Derivative-free optimization via classification AAAI
Mastering the game of Go with deep neural networks and tree search Nature 2016
Mastering the game of go without human knowledge Nature 2017
A0c: Alpha zero in continuous action space arXiv 2018
Value prediction network NeurIPS 2017
Probabilistic adaptation of text-to-video models arXiv 2023 Website
RoboDreamer: Learning Compositional World Models for Robot Imagination arXiv 2024 Website
Discuss before moving: Visual language navigation via multi-expert discussions IEEE 2024 Website
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and Structured Representation CVPR 2024 Star
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation CVPR 2024 Website
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models arXiv 2025
Position: LLMs can't plan, but can help planning in LLM-modulo frameworks arXiv 2024
Language models meet world models: Embodied experiences enhance language models NeurIPS 2023 Star
Virtualhome: Simulating household activities via programs CVPR 2018 Website
Learning to Model the World with Language arXiv 2023 Website
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency arXiv 2023
Alfworld: Aligning text and embodied environments for interactive learning ICLR 2021 Star
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents EMNLP 2024 Star
Agent Planning with World Knowledge Model NeurIPS 2024 Star
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment NeurIPS 2024 Star
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation ICLR 2025 Star

3.2 World Knowledge Learned by Models

Title Pub. & Date Code / Project URL
Does the chimpanzee have a theory of mind? Behav. & Brain Sci. 1978
GPT4GEO: How a Language Model Sees the World’s Geography FMDM@NeurIPS 2023
LLMs achieve adult human performance on higher-order theory of mind tasks arXiv 2024
COKE: A cognitive knowledge graph for machine theory of mind ACL 2024 Star
Think Twice: Perspective-Taking Improves LLM Theory-of-Mind ACL 2024 Star
Language Models Represent Space and Time ICLR 2024 Star
GeoLLM: Extracting Geospatial Knowledge from Large Language Models ICLR 2024 Star
Large language models are geographically biased ICML 2024 Star
Emergent Representations of Program Semantics in Language Models Trained on Programs ICML 2024 Star
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages NeurIPS 2024 Star
SafeWorld: Geo-Diverse Safety Alignment NeurIPS 2024 Star
EAI: Emotional Decision-Making of LLMs in Strategic Games and Ethical Dilemmas NeurIPS 2024 Star
Testing theory of mind in large language models and humans Nature Human Behaviour 2024 Website
Automated construction of cognitive maps with visual predictive coding Nature Machine Intelligence 2024 Star
Evaluating Large Language Models in Theory of Mind Tasks PNAS 2024 Website
Elements of World Knowledge (EWOK) Transactions of the ACL 2025 Website
The Geometry of Concepts: Sparse Autoencoder Feature Structure Entropy 2025 Star
AgentMove: A large language model based agentic framework for zero-shot next location prediction NAACL 2025 Star
CityGPT: Empowering Urban Spatial Cognition of Large Language Models KDD 2025 Star
CityBench: Evaluating the Capabilities of Large Language Model as World Model KDD 2025 Star
LocalGPT: Benchmarking and Advancing Large Language Models for Local Life Services KDD 2025 Star
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence ICCV 2025 Star
Open-Set Living Need Prediction with Large Language Models ACL 2025 Findings Star
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation ACL 2025 Findings Website
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning EMNLP 2025 Findings Star
GPS as a Control Signal for Image Generation CVPR 2025 Star
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages CVPR 2025 Website
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models CVPR 2025 Star
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces CVPR 2025 Website
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales arXiv 2025
AI's Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario arXiv 2025
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models arXiv 2025

4 Future Prediction of the Physical World

4.1 World Model as Video Generation

Title Pub. & Date Code / Project URL
Video generation models as world simulators OpenAI Blog 2024
Sora: Creating video from text OpenAI 2024
Is Sora a world simulator? A comprehensive survey on general world models and beyond arXiv 2024
Sora as an AGI world model? A complete survey on text-to-video generation arXiv 2024
How Far is Video Generation from World Model: A Physical Law Perspective arXiv 2024
Do generative video models learn physical principles from watching videos? arXiv 2025 Website
Genesis: A Generative and Universal Physics Engine for Robotics and Beyond arXiv 2024 GitHub
PhysGen: Rigid-body physics-grounded image-to-video generation ECCV 2024 Website
NUWA-XL: Diffusion over Diffusion for Extremely Long Video Generation arXiv 2023
Learning a 3D occupancy world model for autonomous driving arXiv 2023 Website
OccSora: 4D Occupancy Generation Models as World Simulators arXiv 2024 Website
World model on million-length video and language with ring-attention arXiv 2024 Website
GAIA-1: A generative world model for autonomous driving arXiv 2023 Website
DriveDreamer: Towards real-world-driven world models arXiv 2023 GitHub
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation arXiv 2024 GitHub
Driving into the Future: Multiview Visual Forecasting and Planning with World Model arXiv 2023 GitHub
Vista: A Generalizable Driving World Model with High Fidelity arXiv 2024
WorldDreamer: Towards general world models for video generation arXiv 2024 Website
WorldGPT: a Sora-inspired video AI agent arXiv 2024

4.2 World Model as Embodied Environment

Title Pub. & Date Code / Project URL
Holodeck: Language guided generation of 3d embodied ai environments CVPR 2024 Star
GRUtopia: Dream General Robots in a City at Scale arXiv 2024 Star
Anyhome: Open-vocabulary generation of structured and textured 3d homes ECCV 2024 Star
LEGENT: Open Platform for Embodied Agents arXiv 2024 Star
UrbanWorld: An Urban World Model for 3D City Generation arXiv 2024 Star
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility ICLR 2025 Star
Minedojo: Building open-ended embodied agents with internet-scale knowledge NeurIPS 2022 Star
UniSim: Learning Interactive Real-World Simulators ICLR 2024 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment arXiv 2024 Star
Empowering World Models with Reflection for Embodied Video Prediction ICML 2025 -
Streetscapes: Large-scale consistent street view generation using autoregressive video diffusion SIGGRAPH 2024 -
AVID: Adapting Video Diffusion Models to World Models arXiv 2024 Star
Pandora: Towards General World Model with Natural Language Actions and Video States arXiv 2024 Star
RoboScape: Physics-informed Embodied World Model arXiv 2025 Star
TesserAct: Learning 4D Embodied World Models arXiv 2025 Star

5 Applications of World Models

5.1 Game Intelligence

Title Pub. & Date Code / Project URL
World and Human Action Models towards Gameplay Ideation Nature 2025 -
GameFactory: Creating New Games with Generative Interactive Videos ICCV 2025 Star
Unbounded: A Generative Infinite Game of Character Life Simulation CVPR 2025 Website
GameGen-𝕏: Interactive Open-world Game Video Generation ICLR 2025 Star
Diffusion Models Are Real-Time Game Engines ICLR 2025 Website
Exploration-Driven Generative Interactive Environments ICLR 2025 Star
Matrix-Game: Interactive World Foundation Model arXiv 2025 Star
Mineworld: a Real-time and Open-source Interactive World Model on Minecraft arXiv 2025 Star
Model as a Game: On Numerical and Spatial Consistency for Generative Games arXiv 2025 -

5.2 Embodied Intelligence

Title Pub. & Date Code / Project URL
OpenEQA: Embodied Question Answering in the Era of Foundation Models CVPR 2024 Star
iVideoGPT: Interactive VideoGPTs are Scalable World Models NeurIPS 2024 Star
IRASim: A Fine-Grained World Model for Robot Manipulation ICCV 2025 Star
RoboScape: Physics-informed Embodied World Model arXiv 2025 Star
TesserAct: Learning 4D Embodied World Models arXiv 2025 Star
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning arXiv 2025 Star
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations ICML 2025 Star
DreamGen: Unlocking Generalization in Robot Learning through Video World Models arXiv 2025 Star
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation arXiv 2025 -
EnerVerse-AC: Envisioning Embodied Environments with Action Condition arXiv 2025 Star
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation arXiv 2025 Star
Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation arXiv 2025 -
WorldVLA: Towards Autoregressive Action World Model arXiv 2025 Star
ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model arXiv 2025 Star
ORV: 4D Occupancy-centric Robot Video Generation arXiv 2025 Star
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation ICCV 2025 -
WorldEval: World Model as Real-World Robot Policies Evaluator arXiv 2025 Star

5.3 Urban Intelligence

Autonomous Driving
Title Pub. & Date Code / Project URL
Video generation models as world simulators OpenAI Research (2024) website
GPT-4 technical report arXiv 2023 website
Visual Instruction Tuning arXiv 2023 website
World models for autonomous driving: An initial survey IEEE T-IV 2024 website
Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research arXiv 2023 GitHub
Model-free deep reinforcement learning for urban autonomous driving arXiv 2019 website
Planning-oriented autonomous driving arXiv 2022 GitHub
A survey on trajectory-prediction methods for autonomous driving IEEE T-IV 2022 website
BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers arXiv 2022 GitHub
Transfusion: Robust lidar-camera fusion for 3D object detection with transformers CVPR 2022 GitHub
OmniDet: Surround-view cameras based multi-task visual perception network for autonomous driving arXiv 2021 website
YOLOP: You only look once for panoptic driving perception MIR 2022 GitHub
Wayformer: Motion forecasting via simple & efficient attention networks arXiv 2022 website
MTR: Motion transformer with global intention localization and local movement refinement NeurIPS 2022 GitHub
QCNet: Query-centric trajectory prediction CVPR 2023 website
HPTR: Real-time motion prediction via heterogeneous polyline transformer with relative pose encoding arXiv 2023 website
MotionDiffuser: Controllable multi-agent motion prediction using diffusion CVPR 2023 website
Tokenize the world into object-level knowledge to address long-tail events in autonomous driving arXiv 2024 website
OmniDrive: A generalist driving model for perception and planning arXiv 2024 website
GAIA-1: A generative world model for autonomous driving arXiv 2023 website
DriveDreamer: Towards real-world-driven world models for autonomous driving arXiv 2023 GitHub
Drive-WM: Multiview visual forecasting and planning with world model for autonomous driving arXiv 2023 GitHub
OccWorld: Learning a 3D occupancy world model for autonomous driving arXiv 2023 website
OccSora: 4D occupancy generation models as world simulators for autonomous driving arXiv 2024 website
Vista: A generalizable driving world model with high fidelity and versatile controllability arXiv 2024 website
Copilot4D: Learning unsupervised world models for autonomous driving via discrete diffusion arXiv 2023 website
MUVO: A multimodal generative world model for autonomous driving with geometric representations arXiv 2023 GitHub
UniWorld: Autonomous driving pre-training via world models arXiv 2023 website
MetaUrban: A simulation platform for embodied AI in urban spaces arXiv 2024 GitHub
UrbanWorld: An urban world model for 3D city generation arXiv 2024 website
Streetscapes: Large-scale consistent street view generation using autoregressive video diffusion arXiv 2024 website
Recurrent world models facilitate policy evolution NeurIPS 2018 website
Autonomous Logistics & Urban Analytics
Title Pub. & Date Code / Project URL
Navigation World Models CVPR 2025 Star
Towards Autonomous Micromobility through Scalable Urban Simulation CVPR 2025 Star
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation CVPR 2025 Star
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos CVPR 2025 Star
AirScape: An Aerial Generative World Model with Motion Controllability ACM MM 2025
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory ACL 2025 Star
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space EMNLP 2025 Star
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces ACL 2025 Star
GeoLLM: Extracting Geospatial Knowledge from Large Language Models ICLR 2024 Star
CityGPT: Empowering Urban Spatial Cognition of Large Language Models KDD 2025 Star
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence ICCV 2025 Star
GPS as a Control Signal for Image Generation CVPR 2025 Star
AI's Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario arXiv 2025
AgentMove: A Large Language Model based Agentic Framework for Zero-shot Next Location Prediction NAACL 2025 Star
CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation arXiv 2025 Star
Open-Set Living Need Prediction with Large Language Models ACL 2025 Star

5.4 Societal Intelligence

Title Pub. & Date Code / Project URL
AgentSociety: Large-scale simulation of LLM-driven generative agents ACL 2025, COLM 2025 Open Platform, Main Report, ACL 2025, COLM 2025, arXiv 2025, Star
GenSim: A General Social Simulation Platform with Large Language Model based Agents NAACL 2025 Star
Simulating Human-like Daily Activities with Desire-driven Autonomy ICLR 2025 Star
EconAgent: Large language model-empowered agents for simulating macroeconomic activities ACL 2024 Star
Agent-Pro: Learning to evolve via policy-level reflection and optimization ACL 2024 Star
Exploring collaboration mechanisms for LLM agents: A social psychology view ACL 2024 Star
Cooperate or Collapse: Emergence of sustainability behaviors in a society of LLM agents NeurIPS 2024 Star
SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series ICLR 2024 Star
SRAP-Agent: Simulating and optimizing scarce resource allocation policy with LLM-based agent EMNLP 2024 Star
Generative agents: Interactive simulacra of human behavior UIST 2023 Star
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users arXiv 2025 Star
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models arXiv 2025 Star
OASIS: Open Agent Social Interaction Simulations with One Million Agents arXiv 2024 Star
Project Sid: Many-agent simulations toward AI civilization arXiv 2024 Star
Network Formation and Dynamics Among Multi-LLMs arXiv 2024 Star
S3: Social-network Simulation System with Large Language Model-Empowered Agents arXiv 2023 website
Exploring large language models for communication games: An empirical study on werewolf arXiv 2023 Star

About

[ACM CSUR 2025] Understanding World or Predicting Future? A Comprehensive Survey of World Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6