| 2023 |
arXiv |
GAIA-1 |
GAIA-1: A Generative World Model for Autonomous Driving |
 |
|
| 2023 |
arXiv |
ADriver-I |
ADriver-I: A General World Model for Autonomous Driving |
|
|
| 2024 |
ICLR |
MagicDrive |
MagicDrive: Street View Generation with Diverse 3D Geometry Control |
 |
 |
| 2024 |
CVPR |
Panacea |
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving |
 |
 |
| 2024 |
CVPR |
Drive-WM |
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving |
 |
 |
| 2024 |
CVPR |
360DVD |
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model |
 |
 |
| 2024 |
ECCV |
DriveDreamer |
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving |
 |
 |
| 2024 |
ECCV |
DrivingDiffusion |
DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model |
 |
 |
| 2024 |
ECCV |
WoVoGen |
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-camera Driving Scene Generation |
|
 |
| 2024 |
NeurIPS |
Vista |
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability |
 |
 |
| 2024 |
NeurIPS |
DIAMOND |
Diffusion for World Modeling: Visual Details Matter in Atari |
 |
 |
| 2024 |
arXiv |
MagicDrive3D |
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes |
 |
 |
| 2024 |
arXiv |
Delphi |
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation |
 |
 |
| 2024 |
arXiv |
BEVWorld |
BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space |
|
 |
| 2024 |
arXiv |
DriveArena |
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving |
 |
 |
| 2024 |
arXiv |
DiVE |
DiVE: DiT-based Video Generation with Enhanced Control |
 |
 |
| 2024 |
arXiv |
DreamForge |
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes |
 |
|
| 2024 |
arXiv |
SyntheOcc |
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs |
 |
 |
| 2024 |
arXiv |
HoloDrive |
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving |
|
|
| 2024 |
arXiv |
CogDriving |
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention |
 |
|
| 2024 |
arXiv |
Imagine360 |
Imagine360: Immersive 360 Video Generation from Perspective Anchor |
 |
 |
| 2024 |
arXiv |
DrivingWorld |
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT |
 |
 |
| 2024 |
arXiv |
ViewCrafter |
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis |
 |
 |
| 2024 |
arXiv |
ViewExtrapolator |
Novel View Extrapolation with Video Diffusion Priors |
 |
 |
| 2025 |
AAAI |
DriveDreamer-2 |
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation |
 |
 |
| 2025 |
ICLR |
4K4DGen |
4K4DGen: Panoramic 4D Generation at 4K Resolution |
 |
|
| 2025 |
ICLR |
GameGen-X |
GameGen-X: Interactive Open-world Game Video Generation |
 |
 |
| 2025 |
ICLR |
GameNGen |
Diffusion Models Are Real-Time Game Engines |
 |
|
| 2025 |
ICLR |
Genex |
Generative World Explorer |
 |
 |
| 2025 |
ICLR |
GLAD |
Glad: A Streaming Scene Generator for Autonomous Driving |
|
|
| 2025 |
CVPR |
DrivingSphere |
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation |
 |
 |
| 2025 |
CVPR |
StreetCrafter |
StreetCrafter: Street View Synthesiswith Controllable Video Diffusion Models |
 |
 |
| 2025 |
CVPR |
DriveScape |
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation |
 |
|
| 2025 |
CVPR |
UniScene |
UniScene: Unified Occupancy-centric Driving Scene Generation |
 |
 |
| 2025 |
CVPR |
GEM |
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control |
 |
 |
| 2025 |
CVPR |
UMGen |
Generating Multimodal Driving Scenes via Next-Scene Prediction |
 |
 |
| 2025 |
CVPR |
CAT4D |
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models |
 |
|
| 2025 |
CVPR |
Wonderland |
Wonderland: Navigating 3D Scenes from a Single Image |
 |
 |
| 2025 |
CVPR |
VideoScene |
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step |
 |
 |
| 2025 |
CVPR |
Scene Splatter |
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model |
 |
|
| 2025 |
CVPR |
DynamicScaler |
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes |
 |
|
| 2025 |
ICML |
AdaWorld |
AdaWorld: Learning Adaptable World Models with Latent Actions |
 |
 |
| 2025 |
Nature |
WHAM |
World and Human Action Models towards gameplay ideation |
 |
|
| 2025 |
arXiv |
DreamDrive |
DreamDrive: Generative 4D Scene Modeling from Street View Images |
 |
|
| 2025 |
arXiv |
MaskGWM |
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction |
 |
 |
| 2025 |
arXiv |
UniFuture |
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception |
 |
 |
| 2025 |
arXiv |
SimWorld |
SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model |
|
 |
| 2025 |
arXiv |
DiST-4D |
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation |
 |
 |
| 2025 |
arXiv |
GAIA-2 |
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving |
 |
|
| 2025 |
arXiv |
SteerX |
SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering |
 |
 |
| 2025 |
arXiv |
WonderVerse |
WonderVerse: Extendable 3D Scene Generation with Video Generative Models |
|
|
| 2025 |
arXiv |
FlexWorld |
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis |
 |
 |
| 2025 |
arXiv |
GaussVideoDreamer |
GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting |
|
|
| 2025 |
arXiv |
WORLDMEM |
WORLDMEM: Long-term Consistent World Simulation with Memory |
 |
 |
| 2025 |
arXiv |
HoloTime |
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation |
 |
 |
| 2025 |
arXiv |
MineWorld |
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft |
|
 |
| 2025 |
arXiv |
GameFactory |
GameFactory: Creating New Games with Generative Interactive Videos |
 |
 |
| 2025 |
arXiv |
CoGen |
CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving |
 |
|
| 2025 |
arXiv |
Dreamland |
Dreamland: Controllable World Creation with Simulator and Generative Models |
 |
|
| 2025 |
arXiv |
Voyager |
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation |
|
|
| 2025 |
arXiv |
Matrix-Game |
Matrix-Game: Interactive World Foundation Model |
 |
 |
| 2025 |
arXiv |
Matrix-Game 2.0 |
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model |
 |
 |
| 2025 |
arXiv |
Hunyuan-GameCraft |
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition |
 |
|
| 2025 |
arXiv |
CoCo4D |
CoCo4D: Comprehensive and Complex 4D Scene Generation |
 |
 |
| 2025 |
arXiv |
WonderFree |
WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration |
 |
 |
| 2025 |
arXiv |
4DVD |
4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation |
 |
|
| 2025 |
arXiv |
IDCNet |
IDCNet: Guided Video Diffusion for Metric-Consistent RGBD Scene Generation with Precise Camera Control |
 |
|
| 2025 |
arXiv |
4DNeX |
4DNeX: Feed-Forward 4D Generative Modeling Made Easy |
 |
 |
| 2025 |
ICCV |
WonderPlay |
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions |
 |
|
| 2025 |
ICCV |
MagicDrive-V2 |
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control |
 |
 |
| 2025 |
ICCV |
DynamicVoyager |
Voyaging into Unbounded Dynamic Scenes from a Single View |
 |
 |
| 2025 |
ICCV |
InfiniCube |
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models |
 |
 |
| 2025 |
ICCV |
VMem |
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory |
 |
 |
| 2025 |
SIGGRAPH Asia |
VideoFrom3D |
VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models |
 |
 |
| 2025 |
SIGGRAPH Asia |
WorldExplorer |
WorldExplorer: Towards Generating Fully Navigable 3D Scenes |
 |
 |
| 2025 |
arXiv |
WorldForge |
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance |
 |
 |
| 2025 |
arXiv |
|
From Virtual Games to Real-World Play |
 |
 |
| 2025 |
arXiv |
FantasyWorld |
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction |
|
|
| 2025 |
arXiv |
EvoWorld |
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory |
|
 |
| 2025 |
arXiv |
Captain Safari |
Captain Safari: A World Engine |
 |
|
| 2025 |
arXiv |
MagicWorld |
MagicWorld: Interactive Geometry-driven Video World Exploration |
 |
 |
| 2025 |
arXiv |
One4D |
One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control |
 |
 |