A curated list of latest research papers, projects and resources related to Video Diffusion Models and Video Generation. Content is automatically updated daily.
Last Update: 2026-03-05 02:07:49
π [2026-02] Project Launched β v1.0
-
Adapted from awesome-gaussians framework for tracking video diffusion research
-
Unified CLI: Single entry point
python main.pywith subcommands:init,search,suggest,export-bib,readme -
Interactive Configuration Wizard: Run
python main.py initto set up keywords, domains, time range, and API keys step-by-step -
Custom Time Range Filtering: Support relative periods (
6m,1y,2y) and absolute date ranges -
Smart Link Extraction: Automatically extracts and classifies GitHub, project page, dataset, video, demo, and HuggingFace links from paper abstracts
-
BibTeX Export: Fetch BibTeX from arXiv and export to
.bibfiles with category/date filters -
LLM Keyword Suggestion: Paste a few paper titles or arXiv IDs, and an LLM automatically generates optimized search keywords
-
arXiv Domain Filtering: Restrict searches to specific arXiv categories (e.g.,
cs.CV,cs.AI,cs.MM) -
16 Research Categories: Comprehensive taxonomy covering T2V, I2V, video editing, controllable generation, world models, and more
-
View detailed updates: News.md π
- 3D-aware Video Generation (26 papers) - Video generation with 3D awareness, multi-view consistency, and 4D content creation
- Applications (53 papers) - Domain-specific applications of video diffusion models
- Architecture & Efficiency (362 papers) - Architectural innovations (DiT, UNet), flow matching, and training/inference efficiency
- Audio & Multi-modal (34 papers) - Audio-driven and multi-modal conditioned video generation
- Controllable Generation (113 papers) - Controllable video generation with motion, camera, pose, or layout guidance
- Human & Character Animation (35 papers) - Human-centric video generation including talking heads, dance, and character animation
- Image-to-Video Generation (43 papers) - Methods for animating still images into videos
- Long Video Generation (128 papers) - Generating temporally consistent long-form videos beyond short clips
- Personalization & Customization (95 papers) - Personalized video generation with custom subjects, identities, or styles
- Physical Understanding (124 papers) - Physics-aware video generation and dynamics modeling
- Surveys & Benchmarks (195 papers) - Survey papers, benchmarks, and evaluation metrics for video generation
- Text-to-Video Generation (75 papers) - Foundation models and methods for generating videos from text prompts
- Video Editing (36 papers) - Diffusion-based video editing, style transfer, and manipulation
- Video Inpainting & Completion (6 papers) - Video inpainting, completion, outpainting, and temporal prediction
- Video Super-Resolution & Enhancement (57 papers) - Video quality improvement, upscaling, restoration, and frame interpolation
- World Models & Simulation (102 papers) - Video generation as world simulators and interactive environment generation
- ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling
Authors: Jiayi Zhu, Jianing Zhang, Yiying Yang, Wei Cheng, Xiaoyun Yuan
Links:
Keywords: interactive, world model, multi-view video, dit, video generation, simulation - HumanOrbit: 3D Human Reconstruction as 360Β° Orbit Generation
Authors: Keito Suzuki, Kunyao Chen, Lei Wang, Bang Du, Runfa Blark Li, Peng Liu, Ning Bi, Truong Nguyen
Links:
Keywords: identity, novel view, diffusion model, video diffusion - BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model
Authors: Yuci Han, Charles Toth, John E. Anderson, William J. Shuart, Alper Yilmaz
Links:
Keywords: dit, novel view, diffusion model, video diffusion - Geometry-as-context: Modulating Explicit 3D in Scene-consistent Video Generation to Geometry Context
Authors: JiaKui Hu, Jialun Liu, Liying Yang, Xinliang Zhang, Kaiwen Li, Shuang Zeng, Yuanwei Li, Haibin Huang, Chi Zhang, Yanye Lu
Links:
Keywords: trajectory, camera control, novel view, video generation, autoregressive - RAYNOVA: Scale-Temporal Autoregressive World Modeling in Ray Space
Authors: Yichen Xie, Chensheng Peng, Mazen Abdelfattah, Yihan Hu, Jiezhi Yang, Eric Higgins, Ryan Brigden, Masayoshi Tomizuka, Wei Zhan
Links:|
Keywords: world model, novel view, multi-view video, dit, video generation, physical, autoregressive - A Single Image and Multimodality Is All You Need for Novel View Synthesis
Authors: Amirhosein Javadi, Chi-Shiang Gau, Konstantinos D. Polyzos, Tara Javidi
Links:
Keywords: video generation, dit, efficient, novel view - VideoNeuMat: Neural Material Extraction from Generative Video Models
Authors: Bowen Xue, Saeed Hadadan, Zheng Zeng, Fabrice Rousselle, Zahra Montazeri, Milos Hasan
Links:
Keywords: dit, novel view, diffusion model, video diffusion - CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
Authors: Kaiyi Huang, Yukun Huang, Yu Li, Jianhong Bai, Xintao Wang, Zinan Lin, Xuefei Ning, Jiwen Yu, Pengfei Wan, Yu Wang, Xihui Liu
Links:
Keywords: trajectory, video synthesis, dit, 3d-aware, video generation, physical, text-to-video - SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization
Authors: Lifan Wu, Ruijie Zhu, Yubo Ai, Tianzhu Zhang
Links:|
Keywords: 4d generation, dit - 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
Authors: Zhixue Fang, Xu He, Songlin Tang, Haoxian Zhang, Qingfeng Li, Xiaoqiang Liu, Pengfei Wan, Kun Gai
Links:
Keywords: dynamics, motion control, camera control, dit, 3d-aware, video generation, human motion
Showing the latest 50 out of 53 papers
- EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education
Authors: Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui
Links:
Keywords: evaluation, t2v, benchmark, concept, education, text-to-video - MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
Authors: Rongsheng Wang, Minghao Wu, Hongru Zhou, Zhihan Yu, Zhenyang Cai, Junying Chen, Benyou Wang
Links:|
Keywords: interactive, evaluation, dynamics, benchmark, medical, concept, education, video generation, simulation, physical - HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles
Authors: Yifan Wang, Francesco Pittaluga, Zaid Tasneem, Chenyu You, Manmohan Chandraker, Ziyu Jiang
Links:|
Keywords: trajectory, evaluation, benchmark, dit, autonomous driving, video diffusion, simulation, controllable, temporal consistency - GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generatio
Authors: Hao Zhang, Lue Fan, Qitai Wang, Wenbo Li, Zehuan Wu, Lewei Lu, Zhaoxiang Zhang, Hongsheng Li
Links:
Keywords: video-to-video, trajectory, video editing, dit, autonomous driving, video diffusion, simulation, diffusion model - 3DSPA: A 3D Semantic Point Autoencoder for Evaluating Video Realism
Authors: Bhavik Chandna, Kelsey R. Allen
Links:|
Keywords: trajectory, evaluation, benchmark, robotics, film, video generation, physical, temporal consistency - PedaCo-Gen: Scaffolding Pedagogical Agency in Human-AI Collaborative Video Authoring
Authors: Injun Baek, Yearim Kim, Nojun Kwak
Links:
Keywords: interactive, t2v, dit, education, text-to-video - VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation
Authors: Hui Ren, Yuval Alaluf, Omer Bar Tal, Alexander Schwing, Antonio Torralba, Yael Vinker
Links:
Keywords: interactive, diffusion model, dit, video diffusion, style, creative, efficient, autoregressive, text-to-video - Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation
Authors: Lingyong Yan, Jiulong Wu, Dong Xie, Weixian Shi, Deguo Xia, Jizhou Huang
Links:
Keywords: video generation, dit, education - Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation
Authors: Penghui Ruan, Bojia Zi, Xianbiao Qi, Youze Huang, Rong Xiao, Pichao Wang, Jiannong Cao, Yuhui Shi
Links:
Keywords: dit, film, identity, creative, controllable - VideoWorld 2: Learning Transferable Knowledge from Real-world Videos
Authors: Zhongwei Ren, Yunchao Wei, Xiao Yu, Guixun Luo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin
Links:
Keywords: dynamics, robotics, video diffusion, autoregressive, video generation, diffusion model
Showing the latest 50 out of 362 papers
- EvoPrune: Early-Stage Visual Token Pruning for Efficient MLLMs
Authors: Yuhao Chen, Bin Shan, Xin Ye, Cheng Chen
Links:
Keywords: efficient, benchmark - PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
Authors: Shang Wu, Chenwei Xu, Zhuofan Xia, Weijian Li, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Han Liu
Links:
Keywords: t2v, physics-aware, text-to-video, video generation, physical, physics, architecture - TC-PadΓ©: Trajectory-Consistent PadΓ© Approximation for Diffusion Acceleration
Authors: Benlei Cui, Shaoxuan He, Bukun Huang, Zhizeng Ye, Yunyun Sun, Longtao Huang, Hui Xue, Yang Yang, Jingqun Tang, Zhou Zhao, Haiwen Hong
Links:
Keywords: denoising, trajectory, dynamics, acceleration, dit, video generation, efficient, diffusion model - Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers
Authors: Youngjun Jun, Seil Kang, Woojung Han, Seong Jae Hwang
Links:
Keywords: dit, diffusion transformer, concept, video diffusion - SemanticDialect: Semantic-Aware Mixed-Format Quantization for Video Diffusion Transformers
Authors: Wonsuk Jang, Thierry Tambe
Links:
Keywords: diffusion transformer, dit, video diffusion, video generation, efficient - ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling
Authors: Jiayi Zhu, Jianing Zhang, Yiying Yang, Wei Cheng, Xiaoyun Yuan
Links:
Keywords: interactive, world model, multi-view video, dit, video generation, simulation - Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
Authors: Seunguk Do, Minwoo Huh, Joonghyuk Shin, Jaesik Park
Links:|
Keywords: evaluation, benchmark, dit, human motion, diffusion model - Deep Learning Based Wildfire Detection for Peatland Fires Using Transfer Learning
Authors: Emadeldeen Hamdan, Ahmad Faiz Tharima, Mohd Zahirasri Mohd Tohir, Dayang Nur Sakinah Musa, Erdem Koyuncu, Adam J. Watts, Ahmet Enis Cetin
Links:
Keywords: dit, physical - LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
Authors: Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li
Links:
Keywords: diffusion transformer, avatar, dit, distillation, video diffusion, efficient, controllable - Non-verbal Real-time Human-AI Interaction in Constrained Robotic Environments
Authors: Dragos Costea, Alina Marcu, Cristina Lazar, Marius Leordeanu
Links:
Keywords: architecture, body motion, text-to-video
- UniTalking: A Unified Audio-Video Framework for Talking Portrait Generation
Authors: Hebeizi Li, Zihao Liang, Benyuan Sun, Zihao Yin, Xiao Sha, Chenliang Wang, Yi Yang
Links:
Keywords: multi-modal, style, video generation, efficient, architecture - FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation
Authors: Weiting Tan, Andy T. Liu, Ming Tu, Xinghua Qu, Philipp Koehn, Lu Lu
Links:
Keywords: evaluation, audio-to-video, audio-driven, video generation, autoregressive, temporal consistency - SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
Authors: Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Chaofeng Ao, Nuo Pang, Yiming Wang, Yikun Dou, Zheng Chen, Mingyuan Fan, Tuanhui Li, Mingshan Chang, Hao Zhang, Xiaopeng Sun, Jingtao Xu, Yuqiang Xie, Jiahua Wang, Zhiheng Xu, Weiming Xiong, Yuzhe Jin, Baoxuan Gu, Binjie Mao, Yunjie Yu, Jujie He, Yuhao Feng, Shiwen Tu, Chaojie Wang, Rui Yan, Wei Shen, Jingchen Wu, Peng Zhao, Xuanyue Zhong, Zhuangzhuang Liu, Kaifei Wang, Fuxiang Zhang, Weikai Xu, Wenyan Liu, Binglu Zhang, Yu Shen, Tianhui Xiong, Bin Peng, Liang Zeng, Xuchen Song, Haoxiang Guo, Peiyu Wang, Max W. Y. Lam, Chien-Hung Liu, Yahui Zhou
Links:
Keywords: sound, image to video, frame interpolation, diffusion transformer, video editing, super-resolution, dit, multi-modal, style, video generation, architecture - JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation
Authors: Kai Liu, Yanhao Zheng, Kai Wang, Shengqiong Wu, Rongjunchen Zhang, Jiebo Luo, Dimitrios Hatzinakos, Ziwei Liu, Hao Fei, Tat-Seng Chua
Links:|
Keywords: evaluation, t2v, sound, dit, video generation - EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation
Authors: Rang Meng, Yingjie Yin, Yuming Li, Chenguang Ma
Links:
Keywords: streaming, multi-modal, identity, temporal consistency, video generation, autoregressive, architecture - DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation
Authors: Xu Guo, Fulong Ye, Qichao Sun, Liyang Chen, Bingchuan Li, Pengze Zhang, Jiawei Liu, Songtao Zhao, Qian He, Xiangwang Hou
Links:
Keywords: diffusion transformer, video editing, dit, identity, audio-driven, video generation, controllable - TiFRe: Text-guided Video Frame Reduction for Efficient Video Multi-modal Large Language Models
Authors: Xiangtian Zheng, Zishuo Wang, Yuxin Peng
Links:
Keywords: efficient, multi-modal, style - VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning
Authors: Hao Tan, Jun Lan, Senyuan Shi, Zichang Tan, Zijian Yu, Huijia Zhu, Weiqiang Wang, Jun Wan, Zhen Lei
Links:
Keywords: video generation, multi-modal, evaluation, benchmark - MOVA: Towards Scalable and Synchronized Video-Audio Generation
Authors: SII-OpenMOSS Team, :, Donghua Yu, Mingshu Chen, Qi Chen, Qi Luo, Qianyi Wu, Qinyuan Cheng, Ruixiao Li, Tianyi Liang, Wenbo Zhang, Wenming Tu, Xiangyu Peng, Yang Gao, Yanru Huo, Ying Zhu, Yinze Luo, Yiyang Zhang, Yuerong Song, Zhe Xu, Zhiyu Zhang, Chenchen Yang, Cheng Chang, Chushu Zhou, Hanfu Chen, Hongnan Ma, Jiaxi Li, Jingqi Tong, Junxi Liu, Ke Chen, Shimin Li, Shiqi Jiang, Songlin Wang, Wei Jiang, Zhaoye Fei, Zhiyuan Ning, Chunguo Li, Chenhui Li, Ziwei He, Zengfeng Huang, Xie Chen, Xipeng Qiu
Links:
Keywords: t2v, sound, efficient, text to video, architecture - T2VTree: User-Centered Visual Analytics for Agent-Assisted Thought-to-Video Authoring
Authors: Zhuoyun Zheng, Yu Dong, Gaorong Liang, Guan Li, Guihua Shan, Shiyu Cheng, Dong Tian, Jianlong Zhou, Jie Liang
Links:|
Keywords: video generation, dit, multi-modal, t2v
Showing the latest 50 out of 113 papers
- TC-PadΓ©: Trajectory-Consistent PadΓ© Approximation for Diffusion Acceleration
Authors: Benlei Cui, Shaoxuan He, Bukun Huang, Zhizeng Ye, Yunyun Sun, Longtao Huang, Hui Xue, Yang Yang, Jingqun Tang, Zhou Zhao, Haiwen Hong
Links:
Keywords: denoising, trajectory, dynamics, acceleration, dit, video generation, efficient, diffusion model - LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
Authors: Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li
Links:
Keywords: diffusion transformer, avatar, dit, distillation, video diffusion, efficient, controllable - WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories
Authors: Yisu Zhang, Chenjie Cao, Tengfei Wang, Xuhui Zuo, Junta Wu, Jianke Zhu, Chunchao Guo
Links:
Keywords: camera control, world model, benchmark, video diffusion, video generation, diffusion model - StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models
Authors: Keli Liu, Zhendong Wang, Wengang Zhou, Houqiang Li
Links:
Keywords: architecture, evaluation, acceleration, layout, efficient, autoregressive, text-to-video - FREE-Edit: Using Editing-aware Injection in Rectified Flow Models for Zero-shot Image-Driven Video Editing
Authors: Maomao Li, Yunfei Liu, Yu Li
Links:|
Keywords: denoising, image-to-video, video editing, layout, dit, rectified flow, image-driven, i2v - Let Your Image Move with Your Motion! -- Implicit Multi-Object Multi-Motion Transfer
Authors: Yuze Li, Dong Gong, Xiao Cao, Junchao Yuan, Dongsheng Li, Lei Zhou, Yun Sing Koh, Cheng Yan, Xinyu Zhang
Links:
Keywords: image-to-video, video generation, efficient, controllable, i2v - COMBAT: Conditional World Models for Behavioral Agent Training
Authors: Anmol Agarwal, Pranay Meshram, Sumer Singh, Saurav Suman, Andrew Lapp, Shahbuland Matiana, Louis Castricato, Spencer Frazier
Links:
Keywords: interactive, evaluation, diffusion transformer, world model, benchmark, dit, distillation, video generation, controllable, diffusion model - CamDirector: Towards Long-Term Coherent Video Trajectory Editing
Authors: Zhihao Shi, Kejia Yin, Weilin Wan, Yuhongze Zhou, Yuanhao Yu, Xinxin Zuo, Qiang Sun, Juwei Lu
Links:
Keywords: trajectory, style, benchmark, dit, video diffusion, autoregressive, camera control, diffusion model - ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
Authors: Junhu Fu, Shuyu Liang, Wutong Li, Chen Ma, Peng Huang, Kehao Wang, Ke Chen, Shengli Lin, Pinghong Zhou, Zeju Li, Yuanyuan Wang, Yi Guo
Links:
Keywords: dynamics, dit, video generation, controllable, diffusion model, temporal consistency - The Trinity of Consistency as a Defining Principle for General World Models
Authors: Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang, Hexuan Jin, Caijun Jia, Honghao He, Xinglong Xu, Xi bai, Chang Yu, Yumou Liu, Junnan Zhu, Xuanhe Zhou, Jintao Chen, Xiaobin Hu, Shancheng Pang, Bihui Yu, Ran He, Zhen Lei, Stan Z. Li, Conghui He, Shuicheng Yan, Cheng Tan
Links:
Keywords: architecture, trajectory, evaluation, dynamics, world model, benchmark, world simulator, concept, video generation, physical, temporal consistency
- Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
Authors: Seunguk Do, Minwoo Huh, Joonghyuk Shin, Jaesik Park
Links:|
Keywords: evaluation, benchmark, dit, human motion, diffusion model - LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
Authors: Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li
Links:
Keywords: diffusion transformer, avatar, dit, distillation, video diffusion, efficient, controllable - Non-verbal Real-time Human-AI Interaction in Constrained Robotic Environments
Authors: Dragos Costea, Alina Marcu, Cristina Lazar, Marius Leordeanu
Links:
Keywords: architecture, body motion, text-to-video - GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction
Authors: Chao Xu, Xiaochen Zhao, Xiang Deng, Jingxiang Sun, Zhuo Su, Donglin Di, Yebin Liu
Links:
Keywords: video generation, identity, avatar, diffusion model - Spatio-Temporal Garment Reconstruction Using Diffusion Mapping via Pattern Coordinates
Authors: Yingxuan You, Ren Li, Corentin Dumery, Cong Cao, Hao Li, Pascal Fua
Links:
Keywords: diffusion model, avatar, dit, virtual try-on, temporal consistency - BigMaQ: A Big Macaque Motion and Animation Dataset Bridging Image and 3D Pose Representations
Authors: Lucas Martini, Alexander Lappe, Anna BognΓ‘r, Rufin Vogels, Martin A. Giese
Links:|
Keywords: dynamics, avatar, benchmark - Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling
Authors: Qi Sun, Can Wang, Jiaxiang Shang, Yingchun Liu, Jing Liao
Links:|
Keywords: human animation, dynamics, identity, video diffusion, diffusion model - HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion
Authors: Di Chang, Ji Hou, Aljaz Bozic, Assaf Neuberger, Felix Juefei-Xu, Olivier Maury, Gene Wei-Chin Lin, Tuur Stuyck, Doug Roble, Mohammad Soleymani, Stephane Grabli
Links:
Keywords: evaluation, dynamics, dit, video diffusion, human motion - Hand2World: Autoregressive Egocentric Interaction Generation via Free-Space Hand Gestures
Authors: Yuxi Wang, Wenqi Ouyang, Tianyi Wei, Yi Dong, Zhiqi Shen, Xingang Pan
Links:
Keywords: interactive, dynamics, diffusion model, camera control, world model, gesture, benchmark, dit, video generation, autoregressive - AUHead: Realistic Emotional Talking Head Generation via Action Units Control
Authors: Jiayi Lyu, Leigang Qu, Wenjing Zhang, Hanyu Jiang, Kai Liu, Zhenglin Zhou, Xiaobo Xia, Jian Xue, Tat-Seng Chua
Links:|
Keywords: interactive, motion control, benchmark, avatar, dit, film, identity, video generation, controllable, diffusion model, talking head
- FastLightGen: Fast and Light Video Generation with Fewer Steps and Parameters
Authors: Shao Shitong, Gu Yufei, Xie Zeke
Links:
Keywords: video generation, efficient, distillation, i2v - FREE-Edit: Using Editing-aware Injection in Rectified Flow Models for Zero-shot Image-Driven Video Editing
Authors: Maomao Li, Yunfei Liu, Yu Li
Links:|
Keywords: denoising, image-to-video, video editing, layout, dit, rectified flow, image-driven, i2v - Let Your Image Move with Your Motion! -- Implicit Multi-Object Multi-Motion Transfer
Authors: Yuze Li, Dong Gong, Xiao Cao, Junchao Yuan, Dongsheng Li, Lei Zhou, Yun Sing Koh, Cheng Yan, Xinyu Zhang
Links:
Keywords: image-to-video, video generation, efficient, controllable, i2v - SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
Authors: Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Chaofeng Ao, Nuo Pang, Yiming Wang, Yikun Dou, Zheng Chen, Mingyuan Fan, Tuanhui Li, Mingshan Chang, Hao Zhang, Xiaopeng Sun, Jingtao Xu, Yuqiang Xie, Jiahua Wang, Zhiheng Xu, Weiming Xiong, Yuzhe Jin, Baoxuan Gu, Binjie Mao, Yunjie Yu, Jujie He, Yuhao Feng, Shiwen Tu, Chaojie Wang, Rui Yan, Wei Shen, Jingchen Wu, Peng Zhao, Xuanyue Zhong, Zhuangzhuang Liu, Kaifei Wang, Fuxiang Zhang, Weikai Xu, Wenyan Liu, Binglu Zhang, Yu Shen, Tianhui Xiong, Bin Peng, Liang Zeng, Xuchen Song, Haoxiang Guo, Peiyu Wang, Max W. Y. Lam, Chien-Hung Liu, Yahui Zhou
Links:
Keywords: sound, image to video, frame interpolation, diffusion transformer, video editing, super-resolution, dit, multi-modal, style, video generation, architecture - MultiAnimate: Pose-Guided Image Animation Made Extensible
Authors: Yingcheng Hu, Haowen Gong, Chuanguang Yang, Zhulin An, Yongjun Xu, Songhua Liu
Links:
Keywords: image animation, diffusion transformer, dit, identity, pose-guided, video generation - Human Video Generation from a Single Image with 3D Pose and View Control
Authors: Tiantian Wang, Chun-Han Yao, Tao Hu, Mallikarjun Byrasandra Ramalinga Reddy, Ming-Hsuan Yang, Varun Jampani
Links:
Keywords: image-to-video, latent video, video synthesis, video diffusion, video generation, diffusion model - VII: Visual Instruction Injection for Jailbreaking Image-to-Video Generation Models
Authors: Bowen Zheng, Yongli Xiang, Ziming Hong, Zerong Lin, Chaojian Yu, Tongliang Liu, Xinge You
Links:
Keywords: video generation, dit, image-to-video, i2v - AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?
Authors: Hailong Yan, Shice Liu, Tao Wang, Xiangtao Zhang, Yijie Zhong, Jinwei Chen, Le Zhang, Bo Li
Links:
Keywords: benchmark, image-to-video, diffusion model, i2v - Frame2Freq: Spectral Adapters for Fine-Grained Video Understanding
Authors: Thinesh Thiyakesan Ponbagavathi, Constantin Seibold, Alina Roitberg
Links:|
Keywords: image-to-video, dynamics - FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control
Authors: Mingzhi Sheng, Zekai Gu, Peng Li, Cheng Lin, Hao-Xiang Guo, Ying-Cong Chen, Yuan Liu
Links:
Keywords: dynamics, camera control, dit, video generation, i2v
Showing the latest 50 out of 128 papers
- InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions
Authors: Mohamed Elmoghany, Liangbing Zhao, Xiaoqian Shen, Subhojyoti Mukherjee, Yang Zhou, Gang Wu, Viet Dac Lai, Seunghyun Yoon, Ryan Rossi, Abdullah Rashwan, Puneet Mathur, Varun Manjunatha, Daksh Dangi, Chien Nguyen, Nedim Lipka, Trung Bui, Krishna Kumar Singh, Ruiyi Zhang, Xiaolei Huang, Jaemin Cho, Yu Wang, Namyong Park, Zhengzhong Tu, Hongjie Chen, Hoda Eldardiry, Nesreen Ahmed, Thien Nguyen, Dinesh Manocha, Mohamed Elhoseiny, Franck Dernoncourt
Links:
Keywords: video generation, identity, long-form, video synthesis - StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models
Authors: Keli Liu, Zhendong Wang, Wengang Zhou, Houqiang Li
Links:
Keywords: architecture, evaluation, acceleration, layout, efficient, autoregressive, text-to-video - EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization
Authors: Zhaoxin Fan, Nanxiang Jiang, Daiheng Gao, Shiji Zhou, Wenjun Wu
Links:
Keywords: architecture, t2v, benchmark, rectified flow, concept, video diffusion, temporal consistency, video generation, efficient, diffusion model, text-to-video - DreamWorld: Unified World Modeling in Video Generation
Authors: Boming Tan, Xiangdong Zhang, Ning Liao, Yuqing Zhang, Shaofeng Zhang, Xue Yang, Qi Fan, Yanyong Zhang
Links:|
Keywords: evaluation, dynamics, world model, dit, video generation, physical, temporal consistency - CamDirector: Towards Long-Term Coherent Video Trajectory Editing
Authors: Zhihao Shi, Kejia Yin, Weilin Wan, Yuhongze Zhou, Yuanhao Yu, Xinxin Zuo, Qiang Sun, Juwei Lu
Links:
Keywords: trajectory, style, benchmark, dit, video diffusion, autoregressive, camera control, diffusion model - Mode Seeking meets Mean Seeking for Fast Long Video Generation
Authors: Shengqu Cai, Weili Nie, Chao Liu, Julius Berner, Lvmin Zhang, Nanye Ma, Hansheng Chen, Maneesh Agrawala, Leonidas Guibas, Gordon Wetzstein, Arash Vahdat
Links:|
Keywords: evaluation, long-form, diffusion transformer, long video, flow matching, video generation - Spatio-Temporal Garment Reconstruction Using Diffusion Mapping via Pattern Coordinates
Authors: Yingxuan You, Ren Li, Corentin Dumery, Cong Cao, Hao Li, Pascal Fua
Links:
Keywords: diffusion model, avatar, dit, virtual try-on, temporal consistency - MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation
Authors: Haoyuan Shi, Yunxin Li, Nanhao Deng, Zhenran Xu, Xinyu Chen, Longyue Wang, Baotian Hu, Min Zhang
Links:
Keywords: evaluation, long-form, world model, benchmark, video generation - SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls
Authors: Qianxun Xu, Chenxi Song, Yujun Cai, Chi Zhang
Links:
Keywords: text-to-video, video diffusion, video generation, diffusion model, temporal consistency - ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
Authors: Junhu Fu, Shuyu Liang, Wutong Li, Chen Ma, Peng Huang, Kehao Wang, Ke Chen, Shengli Lin, Pinghong Zhou, Zeju Li, Yuanyuan Wang, Yi Guo
Links:
Keywords: dynamics, dit, video generation, controllable, diffusion model, temporal consistency
Showing the latest 50 out of 95 papers
- InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions
Authors: Mohamed Elmoghany, Liangbing Zhao, Xiaoqian Shen, Subhojyoti Mukherjee, Yang Zhou, Gang Wu, Viet Dac Lai, Seunghyun Yoon, Ryan Rossi, Abdullah Rashwan, Puneet Mathur, Varun Manjunatha, Daksh Dangi, Chien Nguyen, Nedim Lipka, Trung Bui, Krishna Kumar Singh, Ruiyi Zhang, Xiaolei Huang, Jaemin Cho, Yu Wang, Namyong Park, Zhengzhong Tu, Hongjie Chen, Hoda Eldardiry, Nesreen Ahmed, Thien Nguyen, Dinesh Manocha, Mohamed Elhoseiny, Franck Dernoncourt
Links:
Keywords: video generation, identity, long-form, video synthesis - EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education
Authors: Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui
Links:
Keywords: evaluation, t2v, benchmark, concept, education, text-to-video - Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers
Authors: Youngjun Jun, Seil Kang, Woojung Han, Seong Jae Hwang
Links:
Keywords: dit, diffusion transformer, concept, video diffusion - UniTalking: A Unified Audio-Video Framework for Talking Portrait Generation
Authors: Hebeizi Li, Zihao Liang, Benyuan Sun, Zihao Yin, Xiao Sha, Chenliang Wang, Yi Yang
Links:
Keywords: multi-modal, style, video generation, efficient, architecture - Unified Vision-Language Modeling via Concept Space Alignment
Authors: Yifu Qiu, Paul-Ambroise Duquenne, Holger Schwenk
Links:
Keywords: concept, text-to-video - EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization
Authors: Zhaoxin Fan, Nanxiang Jiang, Daiheng Gao, Shiji Zhou, Wenjun Wu
Links:
Keywords: architecture, t2v, benchmark, rectified flow, concept, video diffusion, temporal consistency, video generation, efficient, diffusion model, text-to-video - WildActor: Unconstrained Identity-Preserving Video Generation
Authors: Qin Guo, Tianyu Yang, Xuanhua He, Fei Shen, Yong Zhang, Zhuoliang Kang, Xiaoming Wei, Dan Xu
Links:
Keywords: video generation, dit, identity - MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
Authors: Rongsheng Wang, Minghao Wu, Hongru Zhou, Zhihan Yu, Zhenyang Cai, Junying Chen, Benyou Wang
Links:|
Keywords: interactive, evaluation, dynamics, benchmark, medical, concept, education, video generation, simulation, physical - CamDirector: Towards Long-Term Coherent Video Trajectory Editing
Authors: Zhihao Shi, Kejia Yin, Weilin Wan, Yuhongze Zhou, Yuanhao Yu, Xinxin Zuo, Qiang Sun, Juwei Lu
Links:
Keywords: trajectory, style, benchmark, dit, video diffusion, autoregressive, camera control, diffusion model - GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction
Authors: Chao Xu, Xiaochen Zhao, Xiang Deng, Jingxiang Sun, Zhuo Su, Donglin Di, Yebin Liu
Links:
Keywords: video generation, identity, avatar, diffusion model
Showing the latest 50 out of 124 papers
- PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
Authors: Shang Wu, Chenwei Xu, Zhuofan Xia, Weijian Li, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Han Liu
Links:
Keywords: t2v, physics-aware, text-to-video, video generation, physical, physics, architecture - Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
Authors: Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, Han Liu
Links:|
|
Keywords: evaluation, dynamics, world model, video diffusion, simulation, physical, diffusion model, physics - TC-PadΓ©: Trajectory-Consistent PadΓ© Approximation for Diffusion Acceleration
Authors: Benlei Cui, Shaoxuan He, Bukun Huang, Zhizeng Ye, Yunyun Sun, Longtao Huang, Hui Xue, Yang Yang, Jingqun Tang, Zhou Zhao, Haiwen Hong
Links:
Keywords: denoising, trajectory, dynamics, acceleration, dit, video generation, efficient, diffusion model - Deep Learning Based Wildfire Detection for Peatland Fires Using Transfer Learning
Authors: Emadeldeen Hamdan, Ahmad Faiz Tharima, Mohd Zahirasri Mohd Tohir, Dayang Nur Sakinah Musa, Erdem Koyuncu, Adam J. Watts, Ahmet Enis Cetin
Links:
Keywords: dit, physical - MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
Authors: Rongsheng Wang, Minghao Wu, Hongru Zhou, Zhihan Yu, Zhenyang Cai, Junying Chen, Benyou Wang
Links:|
Keywords: interactive, evaluation, dynamics, benchmark, medical, concept, education, video generation, simulation, physical - DreamWorld: Unified World Modeling in Video Generation
Authors: Boming Tan, Xiangdong Zhang, Ning Liao, Yuqing Zhang, Shaofeng Zhang, Xue Yang, Qi Fan, Yanyong Zhang
Links:|
Keywords: evaluation, dynamics, world model, dit, video generation, physical, temporal consistency - ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
Authors: Junhu Fu, Shuyu Liang, Wutong Li, Chen Ma, Peng Huang, Kehao Wang, Ke Chen, Shengli Lin, Pinghong Zhou, Zeju Li, Yuanyuan Wang, Yi Guo
Links:
Keywords: dynamics, dit, video generation, controllable, diffusion model, temporal consistency - Uni-Animator: Towards Unified Visual Colorization
Authors: Xinyuan Chen, Yao Xu, Shaowen Wang, Pengjie Song, Bowen Deng
Links:
Keywords: dit, diffusion transformer, physical, temporal consistency - The Trinity of Consistency as a Defining Principle for General World Models
Authors: Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang, Hexuan Jin, Caijun Jia, Honghao He, Xinglong Xu, Xi bai, Chang Yu, Yumou Liu, Junnan Zhu, Xuanhe Zhou, Jintao Chen, Xiaobin Hu, Shancheng Pang, Bihui Yu, Ran He, Zhen Lei, Stan Z. Li, Conghui He, Shuicheng Yan, Cheng Tan
Links:
Keywords: architecture, trajectory, evaluation, dynamics, world model, benchmark, world simulator, concept, video generation, physical, temporal consistency - Hierarchic-EEG2Text: Assessing EEG-To-Text Decoding across Hierarchical Abstraction Levels
Authors: Anupam Sharma, Harish Katti, Prajwal Singh, Shanmuganathan Raman, Krishna Miyapuram
Links:
Keywords: concept, dynamics, architecture
Showing the latest 50 out of 195 papers
- EvoPrune: Early-Stage Visual Token Pruning for Efficient MLLMs
Authors: Yuhao Chen, Bin Shan, Xin Ye, Cheng Chen
Links:
Keywords: efficient, benchmark - Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
Authors: Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, Han Liu
Links:|
|
Keywords: evaluation, dynamics, world model, video diffusion, simulation, physical, diffusion model, physics - EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education
Authors: Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui
Links:
Keywords: evaluation, t2v, benchmark, concept, education, text-to-video - BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation
Authors: Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu
Links:
Keywords: video generation, evaluation, t2v, text-to-video - Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
Authors: Seunguk Do, Minwoo Huh, Joonghyuk Shin, Jaesik Park
Links:|
Keywords: evaluation, benchmark, dit, human motion, diffusion model - MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
Authors: Zhongxi Wang, Yueqian Lin, Jingyang Zhang, Hai Helen Li, Yiran Chen
Links:
Keywords: evaluation - HAMMER: Harnessing MLLM via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding
Authors: Lei Yao, Yong Chen, Yuejiao Su, Yi Wang, Moyun Liu, Lap-Pui Chau
Links:
Keywords: benchmark - WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories
Authors: Yisu Zhang, Chenjie Cao, Tengfei Wang, Xuhui Zuo, Junta Wu, Jianke Zhu, Chunchao Guo
Links:
Keywords: camera control, world model, benchmark, video diffusion, video generation, diffusion model - StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models
Authors: Keli Liu, Zhendong Wang, Wengang Zhou, Houqiang Li
Links:
Keywords: architecture, evaluation, acceleration, layout, efficient, autoregressive, text-to-video - Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration
Authors: Jiaqi Han, Juntong Shi, Puheng Li, Haotian Ye, Qiushan Guo, Stefano Ermon
Links:
Keywords: evaluation, acceleration, diffusion transformer, video diffusion, video generation, efficient, diffusion model
Showing the latest 50 out of 75 papers
- PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
Authors: Shang Wu, Chenwei Xu, Zhuofan Xia, Weijian Li, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Han Liu
Links:
Keywords: t2v, physics-aware, text-to-video, video generation, physical, physics, architecture - EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education
Authors: Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui
Links:
Keywords: evaluation, t2v, benchmark, concept, education, text-to-video - BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation
Authors: Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu
Links:
Keywords: video generation, evaluation, t2v, text-to-video - Non-verbal Real-time Human-AI Interaction in Constrained Robotic Environments
Authors: Dragos Costea, Alina Marcu, Cristina Lazar, Marius Leordeanu
Links:
Keywords: architecture, body motion, text-to-video - StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models
Authors: Keli Liu, Zhendong Wang, Wengang Zhou, Houqiang Li
Links:
Keywords: architecture, evaluation, acceleration, layout, efficient, autoregressive, text-to-video - Retrieval, Refinement, and Ranking for Text-to-Video Generation via Prompt Optimization and Test-Time Scaling
Authors: Zillur Rahman, Alex Sheng, Cristian Meo
Links:
Keywords: t2v, frame interpolation, dit, video generation, efficient, diffusion model, text-to-video - Unified Vision-Language Modeling via Concept Space Alignment
Authors: Yifu Qiu, Paul-Ambroise Duquenne, Holger Schwenk
Links:
Keywords: concept, text-to-video - EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization
Authors: Zhaoxin Fan, Nanxiang Jiang, Daiheng Gao, Shiji Zhou, Wenjun Wu
Links:
Keywords: architecture, t2v, benchmark, rectified flow, concept, video diffusion, temporal consistency, video generation, efficient, diffusion model, text-to-video - SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls
Authors: Qianxun Xu, Chenxi Song, Yujun Cai, Chi Zhang
Links:
Keywords: text-to-video, video diffusion, video generation, diffusion model, temporal consistency - SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models
Authors: Yang Yang, Xinze Zou, Zehua Ma, Han Fang, Weiming Zhang
Links:
Keywords: video generation, diffusion model, video diffusion, text-to-video
- FREE-Edit: Using Editing-aware Injection in Rectified Flow Models for Zero-shot Image-Driven Video Editing
Authors: Maomao Li, Yunfei Liu, Yu Li
Links:|
Keywords: denoising, image-to-video, video editing, layout, dit, rectified flow, image-driven, i2v - UniVBench: Towards Unified Evaluation for Video Foundation Models
Authors: Jianhui Wei, Xiaotian Zhang, Yichen Li, Yuan Wang, Yan Zhang, Ziyi Chen, Zhihang Tang, Wei Xu, Zuozhu Liu
Links:
Keywords: evaluation, video editing, benchmark, dit, video generation - SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
Authors: Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Chaofeng Ao, Nuo Pang, Yiming Wang, Yikun Dou, Zheng Chen, Mingyuan Fan, Tuanhui Li, Mingshan Chang, Hao Zhang, Xiaopeng Sun, Jingtao Xu, Yuqiang Xie, Jiahua Wang, Zhiheng Xu, Weiming Xiong, Yuzhe Jin, Baoxuan Gu, Binjie Mao, Yunjie Yu, Jujie He, Yuhao Feng, Shiwen Tu, Chaojie Wang, Rui Yan, Wei Shen, Jingchen Wu, Peng Zhao, Xuanyue Zhong, Zhuangzhuang Liu, Kaifei Wang, Fuxiang Zhang, Weikai Xu, Wenyan Liu, Binglu Zhang, Yu Shen, Tianhui Xiong, Bin Peng, Liang Zeng, Xuchen Song, Haoxiang Guo, Peiyu Wang, Max W. Y. Lam, Chien-Hung Liu, Yahui Zhou
Links:
Keywords: sound, image to video, frame interpolation, diffusion transformer, video editing, super-resolution, dit, multi-modal, style, video generation, architecture - GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generatio
Authors: Hao Zhang, Lue Fan, Qitai Wang, Wenbo Li, Zehuan Wu, Lewei Lu, Zhaoxiang Zhang, Hongsheng Li
Links:
Keywords: video-to-video, trajectory, video editing, dit, autonomous driving, video diffusion, simulation, diffusion model - PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models
Authors: Wonyong Seo, Jaeho Moon, Jaehyup Lee, Soo Ye Kim, Munchurl Kim
Links:
Keywords: video editing, dit, video diffusion, flow matching, diffusion model - RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning
Authors: Seungku Kim, Suhyeok Jang, Byungjun Yoon, Dongyoung Kim, John Won, Jinwoo Shin
Links:
Keywords: video-to-video, trajectory, dit, simulation, physical - When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance
Authors: Ahmed Ghorbel, Badr Moufad, Navid Bagheri Shouraki, Alain Oliviero Durmus, Thomas Hirtz, Eric Moulines, Jimmy Olsson, Yazid Janati
Links:
Keywords: video editing, dit, evaluation, benchmark - DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation
Authors: Xu Guo, Fulong Ye, Qichao Sun, Liyang Chen, Bingchuan Li, Pengze Zhang, Jiawei Liu, Songtao Zhao, Qian He, Xiangwang Hou
Links:
Keywords: diffusion transformer, video editing, dit, identity, audio-driven, video generation, controllable - Tele-Omni: a Unified Multimodal Framework for Video Generation and Editing
Authors: Jialun Liu, Tian Li, Xiao Cao, Yukuo Ma, Gonghu Shang, Haibin Huang, Chi Zhang, Xiangzhen Chang, Zhiyong Huang, Jiakui Hu, Zuoxin Li, Yuanzhi Liang, Cong Liu, Junqi Liu, Robby T. Tan, Haitong Tang, Qizhen Weng, Yifan Xu, Liying Yang, Xiaoyan Yang, Peng Yu, Shiwen Zhang, Xuelong Li
Links:
Keywords: image-to-video, video editing, video synthesis, dit, video generation, text-to-video - Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing
Authors: Hao Yang, Zhiyu Tan, Jia Gong, Luozheng Qin, Hesen Chen, Xiaomeng Yang, Yuqing Sun, Yuetan Lin, Mengping Yang, Hao Li
Links:
Keywords: video editing, benchmark, dit, video diffusion, video generation, efficient, diffusion model, text-to-video
- Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
Authors: Shiqian Li, Ruihong Shen, Junfeng Ni, Chang Pan, Chi Zhang, Yixin Zhu
Links:
Keywords: interactive, evaluation, dynamics, video prediction, world model, video generation, simulation, physical, physics - DriveLaW:Unifying Planning and Video Generation in a Latent Driving World
Authors: Tianze Xia, Yongkang Li, Lijun Zhou, Jingfeng Yao, Kaixin Xiong, Haiyang Sun, Bing Wang, Kun Ma, Guang Chen, Hangjun Ye, Wenyu Liu, Xinggang Wang
Links:
Keywords: trajectory, video prediction, world model, benchmark, autonomous driving, video generation, architecture - Autoregressive Flow Matching for Motion Prediction
Authors: Johnathan Xie, Stefan Stojanov, Cristobal Eyzaguirre, Daniel L. K. Yamins, Jiajun Wu
Links:|
Keywords: video prediction, benchmark, robotics, dit, flow matching, video generation, human motion, autoregressive - Over++: Generative Video Compositing for Layer Interaction Effects
Authors: Luchao Qi, Jiaye Wu, Jun Myeong Choi, Cary Phillips, Roni Sengupta, Dan B Goldman
Links:
Keywords: dit, video inpainting - Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Authors: Yao Feng, Chendong Xiang, Xinyi Mao, Hengkai Tan, Zuyue Zhang, Shuhe Huang, Kaiwen Zheng, Haitian Liu, Hang Su, Jun Zhu
Links:
Keywords: dynamics, video prediction, video diffusion, autoregressive, physical, diffusion model - Flowception: Temporally Expansive Flow Matching for Video Generation
Authors: Tariq Berrada Ifriqi, John Nguyen, Karteek Alahari, Jakob Verbeek, Ricky T. Q. Chen
Links:
Keywords: denoising, video interpolation, image-to-video, flow matching, video generation, efficient, autoregressive
Showing the latest 50 out of 57 papers
- TC-PadΓ©: Trajectory-Consistent PadΓ© Approximation for Diffusion Acceleration
Authors: Benlei Cui, Shaoxuan He, Bukun Huang, Zhizeng Ye, Yunyun Sun, Longtao Huang, Hui Xue, Yang Yang, Jingqun Tang, Zhou Zhao, Haiwen Hong
Links:
Keywords: denoising, trajectory, dynamics, acceleration, dit, video generation, efficient, diffusion model - Retrieval, Refinement, and Ranking for Text-to-Video Generation via Prompt Optimization and Test-Time Scaling
Authors: Zillur Rahman, Alex Sheng, Cristian Meo
Links:
Keywords: t2v, frame interpolation, dit, video generation, efficient, diffusion model, text-to-video - FREE-Edit: Using Editing-aware Injection in Rectified Flow Models for Zero-shot Image-Driven Video Editing
Authors: Maomao Li, Yunfei Liu, Yu Li
Links:|
Keywords: denoising, image-to-video, video editing, layout, dit, rectified flow, image-driven, i2v - SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching
Authors: Yasaman Haghighi, Alexandre Alahi
Links:
Keywords: video generation, denoising, diffusion model, acceleration - Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCache
Authors: Bowen Cui, Yuanbin Wang, Huajiang Xu, Biaolong Chen, Aixi Zhang, Hao Jiang, Zhengzheng Jin, Xu Liu, Pipei Huang
Links:|
Keywords: denoising, trajectory, acceleration, dit, video generation, efficient, diffusion model - SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
Authors: Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Chaofeng Ao, Nuo Pang, Yiming Wang, Yikun Dou, Zheng Chen, Mingyuan Fan, Tuanhui Li, Mingshan Chang, Hao Zhang, Xiaopeng Sun, Jingtao Xu, Yuqiang Xie, Jiahua Wang, Zhiheng Xu, Weiming Xiong, Yuzhe Jin, Baoxuan Gu, Binjie Mao, Yunjie Yu, Jujie He, Yuhao Feng, Shiwen Tu, Chaojie Wang, Rui Yan, Wei Shen, Jingchen Wu, Peng Zhao, Xuanyue Zhong, Zhuangzhuang Liu, Kaifei Wang, Fuxiang Zhang, Weikai Xu, Wenyan Liu, Binglu Zhang, Yu Shen, Tianhui Xiong, Bin Peng, Liang Zeng, Xuchen Song, Haoxiang Guo, Peiyu Wang, Max W. Y. Lam, Chien-Hung Liu, Yahui Zhou
Links:
Keywords: sound, image to video, frame interpolation, diffusion transformer, video editing, super-resolution, dit, multi-modal, style, video generation, architecture - LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration
Authors: Peiliang Cai, Jiacheng Liu, Haowen Xu, Xinyu Wang, Chang Zou, Linfeng Zhang
Links:
Keywords: denoising, architecture, dynamics, acceleration, diffusion transformer, video synthesis, dit, video generation, diffusion model, text-to-video - UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models
Authors: Gang Xu, Zhiyu Zhu, Junhui Hou
Links:|
Keywords: frame interpolation, dit, video diffusion, physical, diffusion model - Predict to Skip: Linear Multistep Feature Forecasting for Efficient Diffusion Transformers
Authors: Hanshuai Cui, Zhiqing Tang, Qianli Ma, Zhi Yao, Weijia Jia
Links:
Keywords: denoising, trajectory, dynamics, acceleration, diffusion transformer, dit, video generation, efficient - DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers
Authors: Dahye Kim, Deepti Ghadiyaram, Raghudeep Gadde
Links:
Keywords: denoising, diffusion transformer, dit, video generation, efficient
Showing the latest 50 out of 102 papers
- Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
Authors: Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, Han Liu
Links:|
|
Keywords: evaluation, dynamics, world model, video diffusion, simulation, physical, diffusion model, physics - ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling
Authors: Jiayi Zhu, Jianing Zhang, Yiying Yang, Wei Cheng, Xiaoyun Yuan
Links:
Keywords: interactive, world model, multi-view video, dit, video generation, simulation - WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories
Authors: Yisu Zhang, Chenjie Cao, Tengfei Wang, Xuhui Zuo, Junta Wu, Jianke Zhu, Chunchao Guo
Links:
Keywords: camera control, world model, benchmark, video diffusion, video generation, diffusion model - COMBAT: Conditional World Models for Behavioral Agent Training
Authors: Anmol Agarwal, Pranay Meshram, Sumer Singh, Saurav Suman, Andrew Lapp, Shahbuland Matiana, Louis Castricato, Spencer Frazier
Links:
Keywords: interactive, evaluation, diffusion transformer, world model, benchmark, dit, distillation, video generation, controllable, diffusion model - MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
Authors: Rongsheng Wang, Minghao Wu, Hongru Zhou, Zhihan Yu, Zhenyang Cai, Junying Chen, Benyou Wang
Links:|
Keywords: interactive, evaluation, dynamics, benchmark, medical, concept, education, video generation, simulation, physical - DreamWorld: Unified World Modeling in Video Generation
Authors: Boming Tan, Xiangdong Zhang, Ning Liao, Yuqing Zhang, Shaofeng Zhang, Xue Yang, Qi Fan, Yanyong Zhang
Links:|
Keywords: evaluation, dynamics, world model, dit, video generation, physical, temporal consistency - MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation
Authors: Haoyuan Shi, Yunxin Li, Nanhao Deng, Zhenran Xu, Xinyu Chen, Longyue Wang, Baotian Hu, Min Zhang
Links:
Keywords: evaluation, long-form, world model, benchmark, video generation - U-Mind: A Unified Framework for Real-Time Multimodal Interaction with Audiovisual Generation
Authors: Xiang Deng, Feng Gao, Yong Zhang, Youxin Pang, Xu Xiaoming, Zhuoliang Kang, Xiaoming Wei, Yebin Liu
Links:
Keywords: video synthesis, interactive, dit - The Trinity of Consistency as a Defining Principle for General World Models
Authors: Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang, Hexuan Jin, Caijun Jia, Honghao He, Xinglong Xu, Xi bai, Chang Yu, Yumou Liu, Junnan Zhu, Xuanhe Zhou, Jintao Chen, Xiaobin Hu, Shancheng Pang, Bihui Yu, Ran He, Zhen Lei, Stan Z. Li, Conghui He, Shuicheng Yan, Cheng Tan
Links:
Keywords: architecture, trajectory, evaluation, dynamics, world model, benchmark, world simulator, concept, video generation, physical, temporal consistency - UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models
Authors: Tianxing Xu, Zixuan Wang, Guangyuan Wang, Li Hu, Zhongyi Zhang, Peng Zhang, Bang Zhang, Song-Hai Zhang
Links:
Keywords: interactive, camera control, diffusion transformer, world model, benchmark, video generation, efficient
-
Video Diffusion Models (NeurIPS 2022)
Authors: Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet
Keywords: Video Diffusion, Generative Model, Unconditional Video Generation -
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (CVPR 2023)
Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
Keywords: Latent Video Diffusion, Text-to-Video, High-Resolution -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (2023)
Authors: Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Menber, Maciej Kilian, Dominik Lorenz, et al.
Code: π GitHub
Keywords: Image-to-Video, Latent Video Diffusion, Large-Scale Training -
Sora: Video Generation Models as World Simulators (OpenAI, 2024)
Authors: OpenAI
Keywords: Text-to-Video, World Simulator, Diffusion Transformer, Long Video -
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer (2024)
Authors: Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, et al.
Code: π GitHub
Keywords: Text-to-Video, Diffusion Transformer, Expert Transformer
- CogVideo - Text-to-video generation with CogVideoX series models (Tsinghua & Zhipu AI)
- Open-Sora - Open-source Sora-like video generation framework
- Open-Sora-Plan - Reproducing Sora with an open-source plan
- HunyuanVideo - Tencent's large-scale video generation model
- Wan2.1 - Alibaba's open-source video generation model
- AnimateDiff - Animate personalized text-to-image models without specific tuning
- Stable Video Diffusion - Stability AI's video generation models
- ModelScope Text-to-Video - ModelScope text-to-video synthesis
- Video Generation Models as World Simulators - OpenAI's Sora technical report
- A Survey on Video Diffusion Models - Comprehensive survey on video diffusion
- Diffusion Models: A Comprehensive Survey - Foundation knowledge on diffusion models
- Unified CLI (
main.py): Single entry point withinit,search,suggest,export-bib,readmesubcommands - Interactive Config Wizard: Guided setup for keywords, domains, time range, and API keys via
python main.py init - Custom Search Keywords: Configure keywords for title, abstract, or both; with arXiv domain filtering (
cs.CV,cs.AI,cs.MM, etc.) - Time Range Filtering: Relative periods (
30d,6m,1y,2y) or absolute date ranges (YYYY-MM-DDtoYYYY-MM-DD) - Smart Link Extraction: Auto-classifies URLs from abstracts into GitHub, project page, dataset, video, demo, HuggingFace links
- BibTeX Export: Fetch BibTeX from arXiv official API; export to
.bibfiles with category and date filters - LLM Keyword Suggestion: Input paper titles or arXiv IDs to auto-generate optimized search keywords via OpenAI-compatible API
- Automated Paper Collection: Daily automatic crawling with GitHub Actions
- Intelligent Classification: Auto-categorize papers into 16 topics (T2V, I2V, Video Editing, Controllable Generation, World Models, etc.)
- Robust Error Handling: Multi-layer retry and fallback strategies ensure stable operation
- GitHub Actions Integration: Automated CI/CD workflows for daily updates
- Multi-type Link Badges: README entries display PDF, GitHub (with stars), Project, Dataset, Video, Demo, HuggingFace, and Citation badges
- Detailed Logging: Comprehensive logging for debugging and monitoring
- Cross-Platform: Support for Windows/Linux/macOS
- Paper JSON files (
data/papers_YYYY-MM-DD.json): Full paper metadata with title, authors, abstract, links, keywords, BibTeX - BibTeX files (
output/*.bib): Ready-to-use bibliography files for LaTeX - Auto-generated README: Categorized and formatted paper listings
pip install -r requirements.txtpython main.py initThis wizard walks you through:
- Setting search keywords (for title, abstract, or both)
- Selecting arXiv domains (e.g.,
cs.CV,cs.AI,cs.MM) - Configuring time range (relative like
6m/1y, or absolute dates) - Setting max results
- Optionally configuring an OpenAI-compatible API key for keyword suggestion
# Search with settings from user_config.json
python main.py search
# Override: fetch 200 papers from the last 6 months, include BibTeX
python main.py search --max-results 200 --recent 6m --bibtex
# Search with absolute date range
python main.py search --date-from 2024-01-01 --date-to 2025-01-01
# Include citation counts from Semantic Scholar
python main.py search --citations# Export all papers from the latest data file
python main.py export-bib --output output/references.bib
# Export only "Text-to-Video Generation" papers
python main.py export-bib --category "Text-to-Video Generation" --output output/t2v.bib
# Export papers from a specific date range
python main.py export-bib --date-from 2024-06-01 --date-to 2025-01-01 --output output/recent.bib# Generate keywords from paper titles
python main.py suggest --titles "Video Diffusion Models" "Stable Video Diffusion"
# Generate from arXiv IDs (auto-fetches titles)
python main.py suggest --arxiv-ids 2204.03458 2311.15127
# Auto-write suggested keywords to config
python main.py suggest --titles "Sora" "CogVideoX" --apply
# Use a custom API endpoint (e.g., DeepSeek)
python main.py suggest --titles "Paper Title" --base-url https://api.deepseek.com/v1 --api-key sk-xxx --model deepseek-chat# Basic README
python main.py readme
# Include latest papers section and abstracts
python main.py readme --show-latest --show-abstractsAll settings are stored in data/user_config.json:
{
"search": {
"keywords": {
"both_abstract_and_title": ["video diffusion", "video generation", "text-to-video"],
"abstract_only": ["diffusion model video generation"],
"title_only": ["video generation", "video diffusion"]
},
"domains": ["cs.CV", "cs.AI", "cs.MM"],
"time_range": {
"mode": "relative",
"relative": "1y"
},
"max_results": 500
},
"api_keys": {
"openai_api_key": "",
"openai_base_url": "https://api.openai.com/v1",
"openai_model": "gpt-4o-mini"
}
}Feel free to submit Pull Requests to improve this list! Please follow these formats:
- Paper entry format:
**[Paper Title](link)** - Brief description - Project entry format:
[Project Name](link) - Project description
