-
Notifications
You must be signed in to change notification settings - Fork 6.4k
[docs] AnimateDiff FreeNoise #9414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
669d76f
update docs
a-r-r-o-w beae195
apply suggestions from review
a-r-r-o-w 5752032
Update docs/source/en/api/pipelines/animatediff.md
a-r-r-o-w cd99de0
Update docs/source/en/api/pipelines/animatediff.md
a-r-r-o-w 6e7334f
Update docs/source/en/api/pipelines/animatediff.md
a-r-r-o-w f504392
apply suggestions from review
a-r-r-o-w File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -914,6 +914,89 @@ export_to_gif(frames, "animatelcm-motion-lora.gif") | |
| </tr> | ||
| </table> | ||
|
|
||
| ## Using FreeNoise | ||
|
|
||
| [FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling](https://arxiv.org/abs/2310.15169) by Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu. | ||
|
|
||
| FreeNoise is a sampling mechanism that allows the generation of longer videos with short-video generation models by employing noise-rescheduling, temporal attention over sliding windows, and weighted averaging of latent frames. It also can be used with multiple prompts to allow for interpolated video generations. More details are available in the paper. | ||
|
|
||
| The currently supported AnimateDiff pipelines that can be used with FreeNoise are: | ||
| - [AnimateDiffPipeline] | ||
| - [AnimateDiffControlNetPipeline] | ||
| - [AnimateDiffVideoToVideoPipeline] | ||
| - [AnimateDiffVideoToVideoControlNetPipeline] | ||
|
||
|
|
||
| In order to use FreeNoise, a single line needs to be added to the inference code after loading your pipelines. | ||
|
|
||
| ```diff | ||
| + pipe.enable_free_noise() | ||
| ``` | ||
|
|
||
| After this, either a single prompt could be used, or multiple prompts can be passed as a dictionary of integer-string pairs. The integer keys of the dictionary correspond to the frame index at which the influence of that prompt would be maximum. Each frame index should map to a single string prompt. The prompts for intermediate frame indices, that are not passed in the dictionary, are created by interpolating between the frame prompts that are passed. By default, simple linear interpolation is used however one can customize this behaviour by a callback to the `prompt_interpolation_callback` parameter when enabling FreeNoise. | ||
a-r-r-o-w marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Full example: | ||
|
|
||
| ```python | ||
| import torch | ||
| from diffusers import AutoencoderKL, AnimateDiffPipeline, LCMScheduler, MotionAdapter | ||
| from diffusers.utils import export_to_video, load_image | ||
|
|
||
| # Load pipeline | ||
| dtype = torch.float16 | ||
| motion_adapter = MotionAdapter.from_pretrained("wangfuyun/AnimateLCM", torch_dtype=dtype) | ||
| vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=dtype) | ||
|
|
||
| pipe = AnimateDiffPipeline.from_pretrained("emilianJR/epiCRealism", motion_adapter=motion_adapter, vae=vae, torch_dtype=dtype) | ||
| pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config, beta_schedule="linear") | ||
|
|
||
| pipe.load_lora_weights( | ||
| "wangfuyun/AnimateLCM", weight_name="AnimateLCM_sd15_t2v_lora.safetensors", adapter_name="lcm_lora" | ||
| ) | ||
| pipe.set_adapters(["lcm_lora"], [0.8]) | ||
|
|
||
| # Enable FreeNoise for long prompt generation | ||
| pipe.enable_free_noise(context_length=16, context_stride=4) | ||
| pipe.to("cuda") | ||
|
|
||
| # Can be a single prompt, or a dictionary with frame timesteps | ||
| prompt = { | ||
| 0: "A caterpillar on a leaf, high quality, photorealistic", | ||
| 40: "A caterpillar transforming into a cocoon, on a leaf, near flowers, photorealistic", | ||
| 80: "A cocoon on a leaf, flowers in the backgrond, photorealistic", | ||
| 120: "A cocoon maturing and a butterfly being born, flowers and leaves visible in the background, photorealistic", | ||
| 160: "A beautiful butterfly, vibrant colors, sitting on a leaf, flowers in the background, photorealistic", | ||
| 200: "A beautiful butterfly, flying away in a forest, photorealistic", | ||
| 240: "A cyberpunk butterfly, neon lights, glowing", | ||
| } | ||
a-r-r-o-w marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| negative_prompt = "bad quality, worst quality, jpeg artifacts" | ||
|
|
||
| # Run inference | ||
| output = pipe( | ||
| prompt=prompt, | ||
| negative_prompt=negative_prompt, | ||
| num_frames=256, | ||
| guidance_scale=2.5, | ||
| num_inference_steps=10, | ||
| generator=torch.Generator("cpu").manual_seed(0), | ||
| ) | ||
|
|
||
| # Save video | ||
| frames = output.frames[0] | ||
| export_to_video(frames, "output.mp4", fps=16) | ||
| ``` | ||
|
|
||
| #### FreeNoise memory savings | ||
a-r-r-o-w marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Since FreeNoise processes multiple frames together, there are parts in the modeling where the memory required exceeds that available on normal consumer GPUs. The main memory bottlenecks that we identified are spatial and temporal attention blocks, upsampling and downsampling blocks, resnet blocks and feed-forward layers. Since most of these blocks operate effectively only on the channel/embedding dimension, one can perform chunked inference across the batch dimensions. The batch dimension in AnimateDiff are either spatial (`[B x F, H x W, C]`) or temporal (`B x H x W, F, C`) in nature (note that it may seem counter-intuitive, but the batch dimension here are correct, because spatial blocks process across the `B x F` dimension while the temporal blocks process across the `B x H x W` dimension). We introduce a `SplitInferenceModule` that makes it easier to chunk across any dimension and perform inference. This saves a lot of memory but comes at the cost of requiring more time for inference. | ||
|
|
||
| ```diff | ||
| # Load pipeline and adapters | ||
| # ... | ||
| + pipe.enable_free_noise_split_inference() | ||
| + pipe.unet.enable_forward_chunking(16) | ||
| ``` | ||
|
|
||
| The call to `pipe.enable_free_noise_split_inference` method accepts two parameters: `spatial_split_size` (defaults to `256`) and `temporal_split_size` (defaults to `16`). These can be configured based on how much VRAM you have available. A lower split size results in lower memory usage but slower inference, whereas a larger split size results in faster inference at the cost of more memory. | ||
|
|
||
| ## Using `from_single_file` with the MotionAdapter | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.