Skip to content

Commit 2647508

Browse files
stevhliusayakpaul
andauthored
[docs] Attention checks (huggingface#12486)
* checks * feedback --------- Co-authored-by: Sayak Paul <[email protected]>
1 parent f072c64 commit 2647508

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed

docs/source/en/optimization/attention_backends.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,45 @@ with attention_backend("_flash_3_hub"):
8181
> [!TIP]
8282
> Most attention backends support `torch.compile` without graph breaks and can be used to further speed up inference.
8383
84+
## Checks
85+
86+
The attention dispatcher includes debugging checks that catch common errors before they cause problems.
87+
88+
1. Device checks verify that query, key, and value tensors live on the same device.
89+
2. Data type checks confirm tensors have matching dtypes and use either bfloat16 or float16.
90+
3. Shape checks validate tensor dimensions and prevent mixing attention masks with causal flags.
91+
92+
Enable these checks by setting the `DIFFUSERS_ATTN_CHECKS` environment variable. Checks add overhead to every attention operation, so they're disabled by default.
93+
94+
```bash
95+
export DIFFUSERS_ATTN_CHECKS=yes
96+
```
97+
98+
The checks are run now before every attention operation.
99+
100+
```py
101+
import torch
102+
103+
query = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
104+
key = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
105+
value = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
106+
107+
try:
108+
with attention_backend("flash"):
109+
output = dispatch_attention_fn(query, key, value)
110+
print("✓ Flash Attention works with checks enabled")
111+
except Exception as e:
112+
print(f"✗ Flash Attention failed: {e}")
113+
```
114+
115+
You can also configure the registry directly.
116+
117+
```py
118+
from diffusers.models.attention_dispatch import _AttentionBackendRegistry
119+
120+
_AttentionBackendRegistry._checks_enabled = True
121+
```
122+
84123
## Available backends
85124

86125
Refer to the table below for a complete list of available attention backends and their variants.

0 commit comments

Comments
 (0)