Skip to content

Commit a5c34af

Browse files
committed
update docs
1 parent 7debcec commit a5c34af

File tree

2 files changed

+30
-0
lines changed

2 files changed

+30
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -590,6 +590,8 @@
590590
title: Attention Processor
591591
- local: api/activations
592592
title: Custom activation functions
593+
- local: api/cache
594+
title: Caching techniques
593595
- local: api/normalization
594596
title: Custom normalization layers
595597
- local: api/utilities

docs/source/en/api/cache.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# Caching Methods
13+
14+
## Pyramid Attention Broadcast
15+
16+
[Pyramid Attention Broadcast](https://huggingface.co/papers/2408.12588) from Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You.
17+
18+
Pyramid Attention Broadcast (PAB) is a method that speeds up inference in diffusion models by systematically skipping attention computations between successive inference steps and reusing cached attention states. The attention states are not very different between successive inference steps. The most prominent difference is in the spatial attention blocks, not as much in the temporal attention blocks, and finally the least in the cross attention blocks. Therefore, many cross attention computation blocks can be skipped, followed by the temporal and spatial attention blocks. By combining other techniques like sequence parallelism and classifier-free guidance parallelism, PAB achieves near real-time video generation.
19+
20+
Enable PAB with [`~PyramidAttentionBroadcastConfig`] on any pipeline. For some benchmarks, refer to [this](https://github.com/huggingface/diffusers/pull/9562) pull request.
21+
22+
## PyramidAttentionBroadcastConfig
23+
24+
[[autodoc]] PyramidAttentionBroadcastConfig
25+
26+
[[autodoc]] apply_pyramid_attention_broadcast
27+
28+
[[autodoc]] apply_pyramid_attention_broadcast_on_module

0 commit comments

Comments
 (0)