Skip to content

Commit 8a7e070

Browse files
author
J石页
committed
[bugfix]NPU Adaption for Sanna
2 parents da924c5 + 5b1dcd1 commit 8a7e070

File tree

107 files changed

+6718
-2002
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

107 files changed

+6718
-2002
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -598,6 +598,8 @@
598598
title: Attention Processor
599599
- local: api/activations
600600
title: Custom activation functions
601+
- local: api/cache
602+
title: Caching methods
601603
- local: api/normalization
602604
title: Custom normalization layers
603605
- local: api/utilities

docs/source/en/api/cache.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# Caching methods
13+
14+
## Pyramid Attention Broadcast
15+
16+
[Pyramid Attention Broadcast](https://huggingface.co/papers/2408.12588) from Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You.
17+
18+
Pyramid Attention Broadcast (PAB) is a method that speeds up inference in diffusion models by systematically skipping attention computations between successive inference steps and reusing cached attention states. The attention states are not very different between successive inference steps. The most prominent difference is in the spatial attention blocks, not as much in the temporal attention blocks, and finally the least in the cross attention blocks. Therefore, many cross attention computation blocks can be skipped, followed by the temporal and spatial attention blocks. By combining other techniques like sequence parallelism and classifier-free guidance parallelism, PAB achieves near real-time video generation.
19+
20+
Enable PAB with [`~PyramidAttentionBroadcastConfig`] on any pipeline. For some benchmarks, refer to [this](https://github.com/huggingface/diffusers/pull/9562) pull request.
21+
22+
```python
23+
import torch
24+
from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
25+
26+
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
27+
pipe.to("cuda")
28+
29+
# Increasing the value of `spatial_attention_timestep_skip_range[0]` or decreasing the value of
30+
# `spatial_attention_timestep_skip_range[1]` will decrease the interval in which pyramid attention
31+
# broadcast is active, leader to slower inference speeds. However, large intervals can lead to
32+
# poorer quality of generated videos.
33+
config = PyramidAttentionBroadcastConfig(
34+
spatial_attention_block_skip_range=2,
35+
spatial_attention_timestep_skip_range=(100, 800),
36+
current_timestep_callback=lambda: pipe.current_timestep,
37+
)
38+
pipe.transformer.enable_cache(config)
39+
```
40+
41+
### CacheMixin
42+
43+
[[autodoc]] CacheMixin
44+
45+
### PyramidAttentionBroadcastConfig
46+
47+
[[autodoc]] PyramidAttentionBroadcastConfig
48+
49+
[[autodoc]] apply_pyramid_attention_broadcast

docs/source/en/using-diffusers/img2img.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -461,12 +461,12 @@ Chain it to an upscaler pipeline to increase the image resolution:
461461
from diffusers import StableDiffusionLatentUpscalePipeline
462462

463463
upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained(
464-
"stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
464+
"stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16, use_safetensors=True
465465
)
466466
upscaler.enable_model_cpu_offload()
467467
upscaler.enable_xformers_memory_efficient_attention()
468468

469-
image_2 = upscaler(prompt, image=image_1, output_type="latent").images[0]
469+
image_2 = upscaler(prompt, image=image_1).images[0]
470470
```
471471

472472
Finally, chain it to a super-resolution pipeline to further enhance the resolution:

docs/source/en/using-diffusers/write_own_pipeline.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Let's try it out!
106106

107107
## Deconstruct the Stable Diffusion pipeline
108108

109-
Stable Diffusion is a text-to-image *latent diffusion* model. It is called a latent diffusion model because it works with a lower-dimensional representation of the image instead of the actual pixel space, which makes it more memory efficient. The encoder compresses the image into a smaller representation, and a decoder to convert the compressed representation back into an image. For text-to-image models, you'll need a tokenizer and an encoder to generate text embeddings. From the previous example, you already know you need a UNet model and a scheduler.
109+
Stable Diffusion is a text-to-image *latent diffusion* model. It is called a latent diffusion model because it works with a lower-dimensional representation of the image instead of the actual pixel space, which makes it more memory efficient. The encoder compresses the image into a smaller representation, and a decoder converts the compressed representation back into an image. For text-to-image models, you'll need a tokenizer and an encoder to generate text embeddings. From the previous example, you already know you need a UNet model and a scheduler.
110110

111111
As you can see, this is already more complex than the DDPM pipeline which only contains a UNet model. The Stable Diffusion model has three separate pretrained models.
112112

0 commit comments

Comments
 (0)