Skip to content

Commit 9c09445

Browse files
sayakpaulstevhliu
andauthored
[docs] slight edits to the attention backends docs. (huggingface#12394)
* slight edits to the attention backends docs. * Update docs/source/en/optimization/attention_backends.md Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Steven Liu <[email protected]>
1 parent 4588bbe commit 9c09445

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

docs/source/en/optimization/attention_backends.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License. -->
1111

1212
# Attention backends
1313

14-
> [!TIP]
14+
> [!NOTE]
1515
> The attention dispatcher is an experimental feature. Please open an issue if you have any feedback or encounter any problems.
1616
1717
Diffusers provides several optimized attention algorithms that are more memory and computationally efficient through it's *attention dispatcher*. The dispatcher acts as a router for managing and switching between different attention implementations and provides a unified interface for interacting with them.
@@ -33,7 +33,7 @@ The [`~ModelMixin.set_attention_backend`] method iterates through all the module
3333

3434
The example below demonstrates how to enable the `_flash_3_hub` implementation for FlashAttention-3 from the [kernel](https://github.com/huggingface/kernels) library, which allows you to instantly use optimized compute kernels from the Hub without requiring any setup.
3535

36-
> [!TIP]
36+
> [!NOTE]
3737
> FlashAttention-3 is not supported for non-Hopper architectures, in which case, use FlashAttention with `set_attention_backend("flash")`.
3838
3939
```py
@@ -78,10 +78,16 @@ with attention_backend("_flash_3_hub"):
7878
image = pipeline(prompt).images[0]
7979
```
8080

81+
> [!TIP]
82+
> Most attention backends support `torch.compile` without graph breaks and can be used to further speed up inference.
83+
8184
## Available backends
8285

8386
Refer to the table below for a complete list of available attention backends and their variants.
8487

88+
<details>
89+
<summary>Expand</summary>
90+
8591
| Backend Name | Family | Description |
8692
|--------------|--------|-------------|
8793
| `native` | [PyTorch native](https://docs.pytorch.org/docs/stable/generated/torch.nn.attention.SDPBackend.html#torch.nn.attention.SDPBackend) | Default backend using PyTorch's scaled_dot_product_attention |
@@ -104,3 +110,5 @@ Refer to the table below for a complete list of available attention backends and
104110
| `_sage_qk_int8_pv_fp16_cuda` | [SageAttention](https://github.com/thu-ml/SageAttention) | INT8 QK + FP16 PV (CUDA) |
105111
| `_sage_qk_int8_pv_fp16_triton` | [SageAttention](https://github.com/thu-ml/SageAttention) | INT8 QK + FP16 PV (Triton) |
106112
| `xformers` | [xFormers](https://github.com/facebookresearch/xformers) | Memory-efficient attention |
113+
114+
</details>

0 commit comments

Comments
 (0)