[CVPR2025] VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

This repository is the official implementation of VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide, led by

Dohun Lee*, Bryan Sangwoo Kim*, Geon Yeong Park, Jong Chul Ye

🔥 Summary

VideoGuide 🚀 enhances temporal quality in video diffusion models without additional training or fine-tuning by leveraging a pretrained model as a guide. During inference, it uses a guiding model to provide a temporally consistent sample, which is interpolated with the sampling model's output to improve consistency. VideoGuide shows the following advantages:

Improved temporal consistency with preserved imaging quality and motion smoothness
Fast inference as application only to early steps is proved sufficient
Prior distillation of the guiding model

🗓 ️News

[8 Oct 2024] Code and paper are uploaded.

🛠️ Setup

First, create your environment. We recommend using the following comments.

git clone https://github.com/DoHunLee1/VideoGuide.git
cd VideoGuide

conda create -n videoguide python=3.10
conda activate videoguide
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu118

⏳ Models

Models	Checkpoints
VideoCrafter2	Hugging Face
AnimateDiff	Hugging Face
RealisticVision	Hugging Face
Stable Diffusion v1.5	Hugging Face

Please refer to the official repositories of AnimateDiff and VideoCrafter for detailed explanation and setup guide for each model. We thank them for sharing their impressive work!

🌄 Example

An example of using VideoGuide is provided in the inference.sh code.

📝 Citation

If you find our method useful, please cite as below or leave a star to this repository.

@article{lee2024videoguide,
  title={VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide},
  author={Lee, Dohun and Kim, Bryan S and Park, Geon Yeong and Ye, Jong Chul},
  journal={arXiv preprint arXiv:2410.04364},
  year={2024}
}

🤗 Acknowledgements

We thank the authors of AnimateDiff, VideoCrafter, Stable Diffusion for sharing their awesome work. We also thank the CivitAI community for sharing their impressive T2I models!

Note

This work is currently in the preprint stage, and there may be some changes to the code.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
animatediff		animatediff
animatediff_configs		animatediff_configs
assets		assets
lvdm		lvdm
prompts		prompts
scripts		scripts
vc_configs		vc_configs
vc_utils		vc_utils
.gitignore		.gitignore
README.md		README.md
convert_from_ckpt.py		convert_from_ckpt.py
convert_lora_safetensor_to_diffusers.py		convert_lora_safetensor_to_diffusers.py
filter_utils.py		filter_utils.py
inference.sh		inference.sh
requirements.txt		requirements.txt
t2v_vc_guide.py		t2v_vc_guide.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[CVPR2025] VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

🔥 Summary

🗓 ️News

🛠️ Setup

⏳ Models

🌄 Example

📝 Citation

🤗 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

DoHunLee1/VideoGuide

Folders and files

Latest commit

History

Repository files navigation

[CVPR2025] VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

🔥 Summary

🗓 ️News

🛠️ Setup

⏳ Models

🌄 Example

📝 Citation

🤗 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages