You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure you have the ROCm toolkit installed and that you replace the `$GFX_NAME` value with that of your GPU architecture (`gfx1030` for consumer RDNA2 cards for example).
97
-
Windows users refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide and troubleshooting tips.
96
+
This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure you have the ROCm toolkit installed and that you replace the `$GFX_NAME` value with that of your GPU architecture (`gfx1030` for consumer RDNA2 cards for example).Windows users refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide and troubleshooting tips.
Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUDA (CUBLAS) is enabled because the kernel implementation is missing.
@@ -250,6 +237,28 @@ _(Note: Don't forget to include `LD_LIBRARY_PATH=/vendor/lib64` in your command
250
237
251
238
To upgrade and rebuild `stable-diffusion-cpp-python` add `--upgrade --force-reinstall --no-cache-dir` flags to the `pip install` command to ensure the package is rebuilt from source.
252
239
240
+
### Using Flash Attention
241
+
242
+
Enabling flash attention for the diffusion model reduces memory usage by varying amounts of MB, e.g.:
243
+
244
+
- **flux 768x768** ~600mb
245
+
- **SD2 768x768** ~1400mb
246
+
247
+
For most backends, it slows things down, but for cuda it generally speeds it up too.
248
+
At the moment, it is only supported for some models and some backends (like `cpu`, `cuda/rocm` and `metal`).
249
+
250
+
Run by passing `diffusion_flash_attn=True` to the `StableDiffusion` class and watch for:
251
+
252
+
```log
253
+
[INFO] stable-diffusion.cpp:312 - Using flash attention in the diffusion model
The thing different from the regular CPU build is `-G "Ninja"`, `-DCMAKE_C_COMPILER=clang`, `-DCMAKE_CXX_COMPILER=clang++`, `-DSD_HIPBLAS=ON`, `-DGPU_TARGETS=gfx1100`, `-DAMDGPU_TARGETS=gfx1100`, `-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON`, `-DCMAKE_POSITION_INDEPENDENT_CODE=ON`
0 commit comments