Skip to content

Commit a7361dc

Browse files
authored
[Pipeline] animatediff + vid2vid + controlnet (#9337)
* add animatediff + vid2vide + controlnet * post tests fixes * PR discussion fixes * update docs * change input video to links on HF + update an example * make quality fix * fix ip adapter test * fix ip adapter test input * update ip adapter test
1 parent 485b8bb commit a7361dc

File tree

7 files changed

+1995
-0
lines changed

7 files changed

+1995
-0
lines changed

docs/source/en/api/pipelines/animatediff.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ The abstract of the paper is the following:
2929
| [AnimateDiffSparseControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_sparsectrl.py) | *Controlled Video-to-Video Generation with AnimateDiff using SparseCtrl* |
3030
| [AnimateDiffSDXLPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_sdxl.py) | *Video-to-Video Generation with AnimateDiff* |
3131
| [AnimateDiffVideoToVideoPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py) | *Video-to-Video Generation with AnimateDiff* |
32+
| [AnimateDiffVideoToVideoControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video_controlnet.py) | *Video-to-Video Generation with AnimateDiff using ControlNet* |
3233

3334
## Available checkpoints
3435

@@ -518,6 +519,97 @@ Here are some sample outputs:
518519
</tr>
519520
</table>
520521

522+
523+
524+
### AnimateDiffVideoToVideoControlNetPipeline
525+
526+
AnimateDiff can be used together with ControlNets to enhance video-to-video generation by allowing for precise control over the output. ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala, and allows you to condition Stable Diffusion with an additional control image to ensure that the spatial information is preserved throughout the video.
527+
528+
This pipeline allows you to condition your generation both on the original video and on a sequence of control images.
529+
530+
```python
531+
import torch
532+
from PIL import Image
533+
from tqdm.auto import tqdm
534+
535+
from controlnet_aux.processor import OpenposeDetector
536+
from diffusers import AnimateDiffVideoToVideoControlNetPipeline
537+
from diffusers.utils import export_to_gif, load_video
538+
from diffusers import AutoencoderKL, ControlNetModel, MotionAdapter, LCMScheduler
539+
540+
# Load the ControlNet
541+
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16)
542+
# Load the motion adapter
543+
motion_adapter = MotionAdapter.from_pretrained("wangfuyun/AnimateLCM")
544+
# Load SD 1.5 based finetuned model
545+
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16)
546+
pipe = AnimateDiffVideoToVideoControlNetPipeline.from_pretrained(
547+
"SG161222/Realistic_Vision_V5.1_noVAE",
548+
motion_adapter=motion_adapter,
549+
controlnet=controlnet,
550+
vae=vae,
551+
).to(device="cuda", dtype=torch.float16)
552+
553+
# Enable LCM to speed up inference
554+
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config, beta_schedule="linear")
555+
pipe.load_lora_weights("wangfuyun/AnimateLCM", weight_name="AnimateLCM_sd15_t2v_lora.safetensors", adapter_name="lcm-lora")
556+
pipe.set_adapters(["lcm-lora"], [0.8])
557+
558+
video = load_video("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/dance.gif")
559+
video = [frame.convert("RGB") for frame in video]
560+
561+
prompt = "astronaut in space, dancing"
562+
negative_prompt = "bad quality, worst quality, jpeg artifacts, ugly"
563+
564+
# Create controlnet preprocessor
565+
open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
566+
567+
# Preprocess controlnet images
568+
conditioning_frames = []
569+
for frame in tqdm(video):
570+
conditioning_frames.append(open_pose(frame))
571+
572+
strength = 0.8
573+
with torch.inference_mode():
574+
video = pipe(
575+
video=video,
576+
prompt=prompt,
577+
negative_prompt=negative_prompt,
578+
num_inference_steps=10,
579+
guidance_scale=2.0,
580+
controlnet_conditioning_scale=0.75,
581+
conditioning_frames=conditioning_frames,
582+
strength=strength,
583+
generator=torch.Generator().manual_seed(42),
584+
).frames[0]
585+
586+
video = [frame.resize(conditioning_frames[0].size) for frame in video]
587+
export_to_gif(video, f"animatediff_vid2vid_controlnet.gif", fps=8)
588+
```
589+
590+
Here are some sample outputs:
591+
592+
<table align="center">
593+
<tr>
594+
<th align="center">Source Video</th>
595+
<th align="center">Output Video</th>
596+
</tr>
597+
<tr>
598+
<td align="center">
599+
anime girl, dancing
600+
<br />
601+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/dance.gif" alt="anime girl, dancing" />
602+
</td>
603+
<td align="center">
604+
astronaut in space, dancing
605+
<br/>
606+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff_vid2vid_controlnet.gif" alt="astronaut in space, dancing" />
607+
</td>
608+
</tr>
609+
</table>
610+
611+
**The lights and composition were transferred from the Source Video.**
612+
521613
## Using Motion LoRAs
522614

523615
Motion LoRAs are a collection of LoRAs that work with the `guoyww/animatediff-motion-adapter-v1-5-2` checkpoint. These LoRAs are responsible for adding specific types of motion to the animations.
@@ -866,6 +958,12 @@ pipe = AnimateDiffPipeline.from_pretrained("emilianJR/epiCRealism", motion_adapt
866958
- all
867959
- __call__
868960

961+
## AnimateDiffVideoToVideoControlNetPipeline
962+
963+
[[autodoc]] AnimateDiffVideoToVideoControlNetPipeline
964+
- all
965+
- __call__
966+
869967
## AnimateDiffPipelineOutput
870968

871969
[[autodoc]] pipelines.animatediff.AnimateDiffPipelineOutput

src/diffusers/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@
245245
"AnimateDiffPipeline",
246246
"AnimateDiffSDXLPipeline",
247247
"AnimateDiffSparseControlNetPipeline",
248+
"AnimateDiffVideoToVideoControlNetPipeline",
248249
"AnimateDiffVideoToVideoPipeline",
249250
"AudioLDM2Pipeline",
250251
"AudioLDM2ProjectionModel",
@@ -694,6 +695,7 @@
694695
AnimateDiffPipeline,
695696
AnimateDiffSDXLPipeline,
696697
AnimateDiffSparseControlNetPipeline,
698+
AnimateDiffVideoToVideoControlNetPipeline,
697699
AnimateDiffVideoToVideoPipeline,
698700
AudioLDM2Pipeline,
699701
AudioLDM2ProjectionModel,

src/diffusers/pipelines/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@
123123
"AnimateDiffSDXLPipeline",
124124
"AnimateDiffSparseControlNetPipeline",
125125
"AnimateDiffVideoToVideoPipeline",
126+
"AnimateDiffVideoToVideoControlNetPipeline",
126127
]
127128
_import_structure["flux"] = [
128129
"FluxControlNetPipeline",
@@ -449,6 +450,7 @@
449450
AnimateDiffPipeline,
450451
AnimateDiffSDXLPipeline,
451452
AnimateDiffSparseControlNetPipeline,
453+
AnimateDiffVideoToVideoControlNetPipeline,
452454
AnimateDiffVideoToVideoPipeline,
453455
)
454456
from .audioldm import AudioLDMPipeline

src/diffusers/pipelines/animatediff/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
_import_structure["pipeline_animatediff_sdxl"] = ["AnimateDiffSDXLPipeline"]
2727
_import_structure["pipeline_animatediff_sparsectrl"] = ["AnimateDiffSparseControlNetPipeline"]
2828
_import_structure["pipeline_animatediff_video2video"] = ["AnimateDiffVideoToVideoPipeline"]
29+
_import_structure["pipeline_animatediff_video2video_controlnet"] = ["AnimateDiffVideoToVideoControlNetPipeline"]
2930

3031
if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
3132
try:
@@ -40,6 +41,7 @@
4041
from .pipeline_animatediff_sdxl import AnimateDiffSDXLPipeline
4142
from .pipeline_animatediff_sparsectrl import AnimateDiffSparseControlNetPipeline
4243
from .pipeline_animatediff_video2video import AnimateDiffVideoToVideoPipeline
44+
from .pipeline_animatediff_video2video_controlnet import AnimateDiffVideoToVideoControlNetPipeline
4345
from .pipeline_output import AnimateDiffPipelineOutput
4446

4547
else:

0 commit comments

Comments
 (0)