You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/optimization/memory.md
+40Lines changed: 40 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,6 +158,46 @@ In order to properly offload models after they're called, it is required to run
158
158
159
159
</Tip>
160
160
161
+
## Group offloading
162
+
163
+
Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either `torch.nn.ModuleList` or `torch.nn.Sequential`), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.
164
+
165
+
To enable group offloading, call the [`~ModelMixin.enable_group_offload`] method on the model if it is a Diffusers model implementation. For any other model implementation, use [`~hooks.group_offloading.apply_group_offloading`]:
166
+
167
+
```python
168
+
import torch
169
+
from diffusers import CogVideoXPipeline
170
+
from diffusers.hooks import apply_group_offloading
Group offloading (for CUDA devices with support for asynchronous data transfer streams) overlaps data transfer and computation to reduce the overall execution time compared to sequential offloading. This is enabled using layer prefetching with CUDA streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed - this increases the memory requirements slightly. Group offloading also supports leaf-level offloading (equivalent to sequential CPU offloading) but can be made much faster when using streams.
200
+
161
201
## FP8 layerwise weight-casting
162
202
163
203
PyTorch supports `torch.float8_e4m3fn` and `torch.float8_e5m2` as weight storage dtypes, but they can't be used for computation in many different tensor operations due to unimplemented kernel support. However, you can use these dtypes to store model weights in fp8 precision and upcast them on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting.
| Zero1to3 Pipeline | Implementation of [Zero-1-to-3: Zero-shot One Image to 3D Object](https://arxiv.org/abs/2303.11328)|[Zero1to3 Pipeline](#zero1to3-pipeline)| - |[Xin Kong](https://github.com/kxhit)|
52
52
| Stable Diffusion XL Long Weighted Prompt Pipeline | A pipeline support unlimited length of prompt and negative prompt, use A1111 style of prompt weighting |[Stable Diffusion XL Long Weighted Prompt Pipeline](#stable-diffusion-xl-long-weighted-prompt-pipeline)|[](https://colab.research.google.com/drive/1LsqilswLR40XLLcp6XFOl5nKb_wOe26W?usp=sharing)|[Andrew Zhu](https://xhinker.medium.com/)|
53
-
| Stable Diffusion Mixture Tiling Pipeline SD 1.5 | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending |[Stable Diffusion Mixture Tiling Pipeline SD 1.5](#stable-diffusion-mixture-tiling-sd-15)|[](https://huggingface.co/spaces/albarji/mixture-of-diffusers)|[Álvaro B Jiménez](https://github.com/albarji/)|
54
-
| Stable Diffusion Mixture Tiling Pipeline SDXL | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending |[Stable Diffusion Mixture Tiling Pipeline SDXL](#stable-diffusion-mixture-tiling-sdxl)|[](https://huggingface.co/spaces/elismasilva/mixture-of-diffusers-sdxl-tiling)|[Eliseu Silva](https://github.com/DEVAIEXP/)|
53
+
| Stable Diffusion Mixture Tiling Pipeline SD 1.5 | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending |[Stable Diffusion Mixture Tiling Pipeline SD 1.5](#stable-diffusion-mixture-tiling-pipeline-sd-15)|[](https://huggingface.co/spaces/albarji/mixture-of-diffusers)|[Álvaro B Jiménez](https://github.com/albarji/)|
54
+
| Stable Diffusion Mixture Canvas Pipeline SD 1.5 | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending. Works by defining a list of Text2Image region objects that detail the region of influence of each diffuser. |[Stable Diffusion Mixture Canvas Pipeline SD 1.5](#stable-diffusion-mixture-canvas-pipeline-sd-15)|[](https://huggingface.co/spaces/albarji/mixture-of-diffusers)|[Álvaro B Jiménez](https://github.com/albarji/)|
55
+
| Stable Diffusion Mixture Tiling Pipeline SDXL | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending |[Stable Diffusion Mixture Tiling Pipeline SDXL](#stable-diffusion-mixture-tiling-pipeline-sdxl)|[](https://huggingface.co/spaces/elismasilva/mixture-of-diffusers-sdxl-tiling)|[Eliseu Silva](https://github.com/DEVAIEXP/)|
55
56
| FABRIC - Stable Diffusion with feedback Pipeline | pipeline supports feedback from liked and disliked images |[Stable Diffusion Fabric Pipeline](#stable-diffusion-fabric-pipeline)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_fabric.ipynb)|[Shauray Singh](https://shauray8.github.io/about_shauray/)|
56
57
| sketch inpaint - Inpainting with non-inpaint Stable Diffusion | sketch inpaint much like in automatic1111 |[Masked Im2Im Stable Diffusion Pipeline](#stable-diffusion-masked-im2im)| - |[Anatoly Belikov](https://github.com/noskill)|
57
58
| sketch inpaint xl - Inpainting with non-inpaint Stable Diffusion | sketch inpaint much like in automatic1111 |[Masked Im2Im Stable Diffusion XL Pipeline](#stable-diffusion-xl-masked-im2im)| - |[Anatoly Belikov](https://github.com/noskill)|
0 commit comments