You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/optimization/attention_backends.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ specific language governing permissions and limitations under the License. -->
16
16
17
17
Diffusers provides several optimized attention algorithms that are more memory and computationally efficient through it's *attention dispatcher*. The dispatcher acts as a router for managing and switching between different attention implementations and provides a unified interface for interacting with them.
18
18
19
-
Available attention implementations include the following.
19
+
Refer to the table below for an overview of the available attention families and to the [Available backends](#available-backends) section for a more complete list.
20
20
21
21
| attention family | main feature |
22
22
|---|---|
@@ -34,7 +34,7 @@ The [`~ModelMixin.set_attention_backend`] method iterates through all the module
34
34
The example below demonstrates how to enable the `_flash_3_hub` implementation for FlashAttention-3 from the [kernel](https://github.com/huggingface/kernels) library, which allows you to instantly use optimized compute kernels from the Hub without requiring any setup.
35
35
36
36
> [!TIP]
37
-
> FlashAttention-3 is not supported for non-Hopper architectures, in which case, use FlashAttention (set_attention_backend("flash")).
37
+
> FlashAttention-3 is not supported for non-Hopper architectures, in which case, use FlashAttention with `set_attention_backend("flash")`.
0 commit comments