You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/optimization/attention_backends.md
+39Lines changed: 39 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,6 +81,45 @@ with attention_backend("_flash_3_hub"):
81
81
> [!TIP]
82
82
> Most attention backends support `torch.compile` without graph breaks and can be used to further speed up inference.
83
83
84
+
## Checks
85
+
86
+
The attention dispatcher includes debugging checks that catch common errors before they cause problems.
87
+
88
+
1. Device checks verify that query, key, and value tensors live on the same device.
89
+
2. Data type checks confirm tensors have matching dtypes and use either bfloat16 or float16.
90
+
3. Shape checks validate tensor dimensions and prevent mixing attention masks with causal flags.
91
+
92
+
Enable these checks by setting the `DIFFUSERS_ATTN_CHECKS` environment variable. Checks add overhead to every attention operation, so they're disabled by default.
93
+
94
+
```bash
95
+
export DIFFUSERS_ATTN_CHECKS=yes
96
+
```
97
+
98
+
The checks are run now before every attention operation.
0 commit comments