Awesome-Embodied-World-Model

Awesome paper list and repos of the paper "A comprehensive survey of embodied world models".

Architectures of Embodied World Models

Video Generation Models

Genie: Generative Interactive Environments. ICML 2024 [Paper]
Sora: Creating video from text. OpenAI 2024 [Website]
Open-Sora: Democratizing efficient video production for all. arXiv 2024 [Paper] [Code]
Genie 2: A large‐scale foundation world model. DeepMind 2024 [Blog]
iVideoGPT: Interactive videogpts are scalable world models. NeurIPS 2024 [Paper] [Code]
NOVA: Autoregressive video generation without vector quantization. ICLR 2025 [Paper] [Code]
Lumos-1: On autoregressive video generation from a unified model perspective. arXiv 2025 [Paper]
MAGI-1: Autoregressive Video Generation at Scale. arXiv 2025 [Paper]
Video-GPT: Video-GPT via Next Clip Diffusion. arXiv 2025 [Paper] [Code]
CogVideoX: Text-to-video diffusion models with an expert transformer. ICLR 2025 [Paper] [Code]
Vid2World: Crafting Video Diffusion Models to Interactive World Models. arXiv 2025 [Paper]
Wan: Open and Advanced Large-Scale Video Generative Models. arXiv 2025 [Paper] [Code]
Cosmos: World foundation model platform for physical AI. arXiv 2025 [Paper] [Code]

3D Reconstruction-enhanced Models

Spmem: Video World Models with Long-term Spatial Memory. arXiv 2025 [Paper] [Project Page]
Geodrive: Geodrive: 3d geometry-informed driving world model with precise action control. arXiv 2025 [Paper] [Code]
Drivedreamer4d: Drivedreamer4d: World models are effective data machines for 4d driving scene representation. CVPR2025 [Paper] [Code]
Recondreamer: Recondreamer: Crafting world models for driving scene reconstruction via online restoration. CVPR2025 [Paper]
VGGT: Vggt: Visual geometry grounded transformer. CVPR2025 [Paper]
DeepVerse: DeepVerse: 4D Autoregressive Video Generation as a World Model. arXiv 2025 [Paper]
Geometry Forcing: Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling. arXiv 2025 [Paper]
UniFuture: Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception. arXiv 2025 [Paper] [Code]
Aether: Aether: Geometric-aware unified world modeling. ICCV 2025 [Paper][Code]
Geo4D: Geo4d: Leveraging video generators for geometric 4d scene reconstruction. ICCV 2025 Highlight [Paper][Code]
PosePilot: PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth. IEEE/RSJ IROS 2025 [Paper]
UniScene: Uniscene: Unified occupancy-centric driving scene generation. CVPR2025 [Paper][Code]
WonderFree: WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration. arXiv 2025 [Paper]
GaussianWorld: Gaussianworld: Gaussian world model for streaming 3d occupancy prediction. CVPR2025 [Paper]
DriveWorld: Driveworld: 4d pre-trained scene understanding via world models for autonomous driving. CVPR2024 [Paper]
Dist-4D: Dist-4d: Disentangled spatiotemporal diffusion with metric depth for 4d driving scene generation. ICCV 2025 [Paper][Code]
TesserAct: TesserAct: learning 4D embodied world models. ICCV 2025 [Paper][Code]
FlowDreamer: FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation. arXiv 2025 [Paper] [Project Page]
Geometry-aware 4D Video Generation for Robot Manipulation: Geometry-aware 4D Video Generation for Robot Manipulation. arXiv 2025 [Paper] [Code]
ORV: ORV: 4D Occupancy-centric Robot Video Generation. arXiv 2025 [Paper] [Code]
3D Persistent Embodied World Models: Learning 3D Persistent Embodied World Models. arXiv 2025 [Paper]
HunyuanWorld 1.0: HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels. arXiv 2025 [Paper] [Code] [Project Page]

Latent Space World Models

PlaNet: Learning Latent Dynamics for Planning from Pixels. ICML 2019 [Paper] [Code] [Blog]
Dreamer: Dream to Control: Learning Behaviors by Latent Imagination. ICLR 2020 [Paper] [Code]
DreamerV2: Mastering Atari with Discrete World Models. ICLR 2021 [Paper] [Code]
DreamerV3: Dream to Control: Learning Behaviors by Latent Imagination. Nature 2025 [Paper] [Code]
I-JEPA: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture. ICCV 2023 [Paper] [Code]
V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video. TMLR 2024 [Paper] [Code]
V-JEPA 2, V-JEPA 2-AC: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. Meta 2024 [Paper] [Code] [Website] [Blog]
TD-MPC: Temporal Difference Learning for Model Predictive Control. ICML 2022 [Paper] [Code] [Website]
TD-MPC-offline: Finetuning Offline World Models in the Real World. CoRL 2023 Oral [Paper] [Code] [Website]
TD-MPC2: TD-MPC2: Scalable, Robust World Models for Continuous Control. ICLR 2024 Spotlight [Paper] [Code] [Website]

Training Paradigm of Embodied World Models

Instruction-conditioned Training

Sora: Sora: A review on background, technology, limitations, and opportunities of large vision models. arXiv 2024 [Paper] [Code]
RoboDreamer: Robodreamer: Learning compositional world models for robot imagination. ICML 2024 [Paper] [Code] [Website]
Pandora: Pandora: Towards general world model with natural language actions and video states. arXiv 2024 [Paper] [Code] [Website]
Cosmos: Cosmos world foundation model platform for physical ai. arXiv 2025 [Paper] [Code] [Website]

Action-conditioned Training

Vid2World: Vid2World: Crafting Video Diffusion Models to Interactive World Models. arXiv 2025 [Paper] [Website]
UWM: Cosmos world foundation model platform for physical ai. ICML 2025 [Paper] [Code] [Website]
Enverse-AC: Enerverse-ac: Envisioning embodied environments with action condition. ICML 2025 [Paper] [Code] [Website]
FLARE: FLARE: Robot learning with implicit world modeling. arXiv 2025 [Paper] [Website]

Physics-informed Training

RoboScape: RoboScape: Physics-informed Embodied World Model. arXiv 2025 [Paper] [Code]
TesserAct: TesserAct: learning 4D embodied world models. arXiv 2025 [Paper] [Code] [Website]

Video-action Joint Training

HMA: Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression. arXiv 2025 [Paper] [Code] [Website]
UVA: Unified video action model. RSS 2025 [Paper] [Code] [Website]
WorldVLA: WorldVLA: Towards Autoregressive Action World Model. DAMO 2025 [Paper] [Code]

RL-based Training

RLVR-World: RLVR-World: Training World Models with Reinforcement Learning arXiv 2025 [Paper] [Code] [Website]

Applications of Embodied World Models

Offline Robotic Data Generation Engine

DreamGen: DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories arXiv 2025 [Paper] [Code] [Website]
RoboTransfer: RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer arXiv 2025 [Paper] [Code] [Website]
EnerVerse-AC: EnerVerse-AC: Envisioning Embodied Environments with Action Condition arXiv 2025 [Paper] [Code] [Website]

Environment Substitute for Reinforcement Learning

GenRL: GenRL: Multimodal-foundation world models for generalization in embodied agents NeurIPS 2024 [Paper] [Code] [Website]
iVideoGPT: iVideoGPT: Interactive VideoGPTs are Scalable World Models NeurIPS 2024 [Paper] [Code] [Website]
DreamerV3: Dream to Control: Learning Behaviors by Latent Imagination. Nature 2025 [Paper] [Code]

Robotic Policy Evaluator

WorldEval: WorldEval: World Model as Real-World Robot Policies Evaluator. arXiv 2025 [Paper] [Code] [Website]
EnerVerse-AC: EnerVerse-AC: Envisioning Embodied Environments with Action Condition arXiv 2025 [Paper] [Code] [Website]
RoboScape: EnerVerse-AC: Envisioning Embodied Environments with Action Condition arXiv 2025 [Paper] [Code]

Action Planner as Embodied Agents

GPC: Strengthening Generative Robot Policies through Predictive World Modeling arXiv 2025 [Paper] [Website]
VPP: Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations ICML 2025 Spotlight [Paper] [Code] [Website]
V-JEPA 2-AC: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. Meta 2024 [Paper] [Code] [Website] [Blog]

Benchmarks of Embodied World Models

Generated Data Quality

VBench : Comprehensive Benchmark Suite for Video Generative Models CVPR 2024 Highlight [Paper] [Code] [Website]
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation CVPR 2025 [Paper] [Code] [Website]
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness arXiv 2025 [Paper] [Code]
VideoPhy: Evaluating Physical Commonsense for Video Generation ICLR 2025 Poster [Paper] [Code]
VideoPhy 2: Challenging Action-Centric Physical Commonsense Evaluation of Video Generation arXiv 2025 [Paper] [Code] [Website]
PhyGenBench: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation ICML 2025 [Paper] [Code] [Website]
WorldModelBench: Judging Video Generation Models As World Models arXiv 2025 [Paper] [Code] [Website]
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models arXiv 2025 [Paper] [Code]

End-to-end Manipulation Evaluation

DreamerV3: Dream to Control: Learning Behaviors by Latent Imagination. Nature 2025 [Paper] [Code]
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. Meta 2024 [Paper] [Code] [Website] [Blog]
WorldSimBench: Towards Video Generation Models as World Simulators ICML 2025 [Paper] [Website]

Evaluation Reliability towards Policy Model

EWM: Evaluating Robot Policies in a World Model, arXiv 2025 [Paper]
WorldEval: World Model as Real-World Robot Policies Evaluator. arXiv 2025 [Paper] [Code] [Website]
RoboScape: Physics-informed Embodied World Model. arXiv 2025 [Paper] [Code]

Data Scaling in Downstream Policy Model

DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories arXiv 2025 [Paper] [Code] [Website]
RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer arXiv 2025 [Paper] [Code] [Website]
GenSim: Generating Robotic Simulation Tasks via Large Language Models ICLR 2024 Spotlight [Paper] [Code]
WorldGPT: Empowering LLM as Multimodal World Model MM 2024 [Paper] [Code]
Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models ICLR 2025 Poster [Paper] [Code]
RoboScape: Physics-informed Embodied World Model arXiv 2025 [Paper] [Code]

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-Embodied-World-Model

Architectures of Embodied World Models

Video Generation Models

3D Reconstruction-enhanced Models

Latent Space World Models

Training Paradigm of Embodied World Models

Instruction-conditioned Training

Action-conditioned Training

Physics-informed Training

Video-action Joint Training

RL-based Training

Applications of Embodied World Models

Offline Robotic Data Generation Engine

Environment Substitute for Reinforcement Learning

Robotic Policy Evaluator

Action Planner as Embodied Agents

Benchmarks of Embodied World Models

Generated Data Quality

End-to-end Manipulation Evaluation

Evaluation Reliability towards Policy Model

Data Scaling in Downstream Policy Model

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

tsinghua-fib-lab/Awesome-Embodied-World-Model

Folders and files

Latest commit

History

Repository files navigation

Awesome-Embodied-World-Model

Architectures of Embodied World Models

Video Generation Models

3D Reconstruction-enhanced Models

Latent Space World Models

Training Paradigm of Embodied World Models

Instruction-conditioned Training

Action-conditioned Training

Physics-informed Training

Video-action Joint Training

RL-based Training

Applications of Embodied World Models

Offline Robotic Data Generation Engine

Environment Substitute for Reinforcement Learning

Robotic Policy Evaluator

Action Planner as Embodied Agents

Benchmarks of Embodied World Models

Generated Data Quality

End-to-end Manipulation Evaluation

Evaluation Reliability towards Policy Model

Data Scaling in Downstream Policy Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Packages