Skip to content

tsinghua-fib-lab/Awesome-Embodied-World-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 

Repository files navigation

Awesome-Embodied-World-Model

Awesome paper list and repos of the paper "A comprehensive survey of embodied world models".

Architectures of Embodied World Models

Video Generation Models

  • Genie: Generative Interactive Environments. ICML 2024 [Paper]

  • Sora: Creating video from text. OpenAI 2024 [Website]

  • Open-Sora: Democratizing efficient video production for all. arXiv 2024 [Paper] [Code]

  • Genie 2: A large‐scale foundation world model. DeepMind 2024 [Blog]

  • iVideoGPT: Interactive videogpts are scalable world models. NeurIPS 2024 [Paper] [Code]

  • NOVA: Autoregressive video generation without vector quantization. ICLR 2025 [Paper] [Code]

  • Lumos-1: On autoregressive video generation from a unified model perspective. arXiv 2025 [Paper]

  • MAGI-1: Autoregressive Video Generation at Scale. arXiv 2025 [Paper]

  • Video-GPT: Video-GPT via Next Clip Diffusion. arXiv 2025 [Paper] [Code]

  • CogVideoX: Text-to-video diffusion models with an expert transformer. ICLR 2025 [Paper] [Code]

  • Vid2World: Crafting Video Diffusion Models to Interactive World Models. arXiv 2025 [Paper]

  • Wan: Open and Advanced Large-Scale Video Generative Models. arXiv 2025 [Paper] [Code]

  • Cosmos: World foundation model platform for physical AI. arXiv 2025 [Paper] [Code]

3D Reconstruction-enhanced Models

  • Spmem: Video World Models with Long-term Spatial Memory. arXiv 2025 [Paper] [Project Page]
  • Geodrive: Geodrive: 3d geometry-informed driving world model with precise action control. arXiv 2025 [Paper] [Code]
  • Drivedreamer4d: Drivedreamer4d: World models are effective data machines for 4d driving scene representation. CVPR2025 [Paper] [Code]
  • Recondreamer: Recondreamer: Crafting world models for driving scene reconstruction via online restoration. CVPR2025 [Paper]
  • VGGT: Vggt: Visual geometry grounded transformer. CVPR2025 [Paper]
  • DeepVerse: DeepVerse: 4D Autoregressive Video Generation as a World Model. arXiv 2025 [Paper]
  • Geometry Forcing: Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling. arXiv 2025 [Paper]
  • UniFuture: Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception. arXiv 2025 [Paper] [Code]
  • Aether: Aether: Geometric-aware unified world modeling. ICCV 2025 [Paper][Code]
  • Geo4D: Geo4d: Leveraging video generators for geometric 4d scene reconstruction. ICCV 2025 Highlight [Paper][Code]
  • PosePilot: PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth. IEEE/RSJ IROS 2025 [Paper]
  • UniScene: Uniscene: Unified occupancy-centric driving scene generation. CVPR2025 [Paper][Code]
  • WonderFree: WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration. arXiv 2025 [Paper]
  • GaussianWorld: Gaussianworld: Gaussian world model for streaming 3d occupancy prediction. CVPR2025 [Paper]
  • DriveWorld: Driveworld: 4d pre-trained scene understanding via world models for autonomous driving. CVPR2024 [Paper]
  • Dist-4D: Dist-4d: Disentangled spatiotemporal diffusion with metric depth for 4d driving scene generation. ICCV 2025 [Paper][Code]
  • TesserAct: TesserAct: learning 4D embodied world models. ICCV 2025 [Paper][Code]
  • FlowDreamer: FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation. arXiv 2025 [Paper] [Project Page]
  • Geometry-aware 4D Video Generation for Robot Manipulation: Geometry-aware 4D Video Generation for Robot Manipulation. arXiv 2025 [Paper] [Code]
  • ORV: ORV: 4D Occupancy-centric Robot Video Generation. arXiv 2025 [Paper] [Code]
  • 3D Persistent Embodied World Models: Learning 3D Persistent Embodied World Models. arXiv 2025 [Paper]
  • HunyuanWorld 1.0: HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels. arXiv 2025 [Paper] [Code] [Project Page]

Latent Space World Models

  • PlaNet: Learning Latent Dynamics for Planning from Pixels. ICML 2019 [Paper] [Code] [Blog]

  • Dreamer: Dream to Control: Learning Behaviors by Latent Imagination. ICLR 2020 [Paper] [Code]

  • DreamerV2: Mastering Atari with Discrete World Models. ICLR 2021 [Paper] [Code]

  • DreamerV3: Dream to Control: Learning Behaviors by Latent Imagination. Nature 2025 [Paper] [Code]

  • I-JEPA: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture. ICCV 2023 [Paper] [Code]

  • V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video. TMLR 2024 [Paper] [Code]

  • V-JEPA 2, V-JEPA 2-AC: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. Meta 2024 [Paper] [Code] [Website] [Blog]

  • TD-MPC: Temporal Difference Learning for Model Predictive Control. ICML 2022 [Paper] [Code] [Website]

  • TD-MPC-offline: Finetuning Offline World Models in the Real World. CoRL 2023 Oral [Paper] [Code] [Website]

  • TD-MPC2: TD-MPC2: Scalable, Robust World Models for Continuous Control. ICLR 2024 Spotlight [Paper] [Code] [Website]

Training Paradigm of Embodied World Models

Instruction-conditioned Training

  • Sora: Sora: A review on background, technology, limitations, and opportunities of large vision models. arXiv 2024 [Paper] [Code]
  • RoboDreamer: Robodreamer: Learning compositional world models for robot imagination. ICML 2024 [Paper] [Code] [Website]
  • Pandora: Pandora: Towards general world model with natural language actions and video states. arXiv 2024 [Paper] [Code] [Website]
  • Cosmos: Cosmos world foundation model platform for physical ai. arXiv 2025 [Paper] [Code] [Website]

Action-conditioned Training

  • Vid2World: Vid2World: Crafting Video Diffusion Models to Interactive World Models. arXiv 2025 [Paper] [Website]
  • UWM: Cosmos world foundation model platform for physical ai. ICML 2025 [Paper] [Code] [Website]
  • Enverse-AC: Enerverse-ac: Envisioning embodied environments with action condition. ICML 2025 [Paper] [Code] [Website]
  • FLARE: FLARE: Robot learning with implicit world modeling. arXiv 2025 [Paper] [Website]

Physics-informed Training

  • RoboScape: RoboScape: Physics-informed Embodied World Model. arXiv 2025 [Paper] [Code]
  • TesserAct: TesserAct: learning 4D embodied world models. arXiv 2025 [Paper] [Code] [Website]

Video-action Joint Training

  • HMA: Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression. arXiv 2025 [Paper] [Code] [Website]
  • UVA: Unified video action model. RSS 2025 [Paper] [Code] [Website]
  • WorldVLA: WorldVLA: Towards Autoregressive Action World Model. DAMO 2025 [Paper] [Code]

RL-based Training

  • RLVR-World: RLVR-World: Training World Models with Reinforcement Learning arXiv 2025 [Paper] [Code] [Website]

Applications of Embodied World Models

Offline Robotic Data Generation Engine

  • DreamGen: DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories arXiv 2025 [Paper] [Code] [Website]
  • RoboTransfer: RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer arXiv 2025 [Paper] [Code] [Website]
  • EnerVerse-AC: EnerVerse-AC: Envisioning Embodied Environments with Action Condition arXiv 2025 [Paper] [Code] [Website]

Environment Substitute for Reinforcement Learning

  • GenRL: GenRL: Multimodal-foundation world models for generalization in embodied agents NeurIPS 2024 [Paper] [Code] [Website]
  • iVideoGPT: iVideoGPT: Interactive VideoGPTs are Scalable World Models NeurIPS 2024 [Paper] [Code] [Website]
  • DreamerV3: Dream to Control: Learning Behaviors by Latent Imagination. Nature 2025 [Paper] [Code]

Robotic Policy Evaluator

  • WorldEval: WorldEval: World Model as Real-World Robot Policies Evaluator. arXiv 2025 [Paper] [Code] [Website]
  • EnerVerse-AC: EnerVerse-AC: Envisioning Embodied Environments with Action Condition arXiv 2025 [Paper] [Code] [Website]
  • RoboScape: EnerVerse-AC: Envisioning Embodied Environments with Action Condition arXiv 2025 [Paper] [Code]

Action Planner as Embodied Agents

  • GPC: Strengthening Generative Robot Policies through Predictive World Modeling arXiv 2025 [Paper] [Website]
  • VPP: Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations ICML 2025 Spotlight [Paper] [Code] [Website]
  • V-JEPA 2-AC: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. Meta 2024 [Paper] [Code] [Website] [Blog]

Benchmarks of Embodied World Models

Generated Data Quality

  • VBench : Comprehensive Benchmark Suite for Video Generative Models CVPR 2024 Highlight [Paper] [Code] [Website]
  • T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation CVPR 2025 [Paper] [Code] [Website]
  • VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness arXiv 2025 [Paper] [Code]
  • VideoPhy: Evaluating Physical Commonsense for Video Generation ICLR 2025 Poster [Paper] [Code]
  • VideoPhy 2: Challenging Action-Centric Physical Commonsense Evaluation of Video Generation arXiv 2025 [Paper] [Code] [Website]
  • PhyGenBench: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation ICML 2025 [Paper] [Code] [Website]
  • WorldModelBench: Judging Video Generation Models As World Models arXiv 2025 [Paper] [Code] [Website]
  • EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models arXiv 2025 [Paper] [Code]

End-to-end Manipulation Evaluation

  • DreamerV3: Dream to Control: Learning Behaviors by Latent Imagination. Nature 2025 [Paper] [Code]
  • V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. Meta 2024 [Paper] [Code] [Website] [Blog]
  • WorldSimBench: Towards Video Generation Models as World Simulators ICML 2025 [Paper] [Website]

Evaluation Reliability towards Policy Model

  • EWM: Evaluating Robot Policies in a World Model, arXiv 2025 [Paper]
  • WorldEval: World Model as Real-World Robot Policies Evaluator. arXiv 2025 [Paper] [Code] [Website]
  • RoboScape: Physics-informed Embodied World Model. arXiv 2025 [Paper] [Code]

Data Scaling in Downstream Policy Model

  • DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories arXiv 2025 [Paper] [Code] [Website]
  • RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer arXiv 2025 [Paper] [Code] [Website]
  • GenSim: Generating Robotic Simulation Tasks via Large Language Models ICLR 2024 Spotlight [Paper] [Code]
  • WorldGPT: Empowering LLM as Multimodal World Model MM 2024 [Paper] [Code]
  • Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models ICLR 2025 Poster [Paper] [Code]
  • RoboScape: Physics-informed Embodied World Model arXiv 2025 [Paper] [Code]

About

Awesome paper list and repos of the paper "A comprehensive survey of embodied world models".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7