You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/api/loaders/lora.md
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,9 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
20
20
-[`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
21
21
-[`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
22
22
-[`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
23
+
-[`LTXVideoLoraLoaderMixin`] provides similar functions for [LTX-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
24
+
-[`SanaLoraLoaderMixin`] provides similar functions for [Sana](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana).
25
+
-[`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
23
26
-[`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
24
27
-[`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
25
28
@@ -53,6 +56,18 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
#
3
+
# Licensed under the Apache License, Version 2.0 (the "License");
4
+
# you may not use this file except in compliance with the License.
5
+
# You may obtain a copy of the License at
6
+
#
7
+
# http://www.apache.org/licenses/LICENSE-2.0
8
+
#
9
+
# Unless required by applicable law or agreed to in writing, software
10
+
# distributed under the License is distributed on an "AS IS" BASIS,
11
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+
# See the License for the specific language governing permissions and
13
+
# limitations under the License.
14
+
-->
15
+
16
+
# CogView4
17
+
18
+
<Tip>
19
+
20
+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
21
+
22
+
</Tip>
23
+
24
+
This pipeline was contributed by [zRzRzRzRzRzRzR](https://github.com/zRzRzRzRzRzRzR). The original codebase can be found [here](https://huggingface.co/THUDM). The original weights can be found under [hf.co/THUDM](https://huggingface.co/THUDM).
Copy file name to clipboardExpand all lines: docs/source/en/optimization/memory.md
+40Lines changed: 40 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,6 +158,46 @@ In order to properly offload models after they're called, it is required to run
158
158
159
159
</Tip>
160
160
161
+
## Group offloading
162
+
163
+
Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either `torch.nn.ModuleList` or `torch.nn.Sequential`), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.
164
+
165
+
To enable group offloading, call the [`~ModelMixin.enable_group_offload`] method on the model if it is a Diffusers model implementation. For any other model implementation, use [`~hooks.group_offloading.apply_group_offloading`]:
166
+
167
+
```python
168
+
import torch
169
+
from diffusers import CogVideoXPipeline
170
+
from diffusers.hooks import apply_group_offloading
Group offloading (for CUDA devices with support for asynchronous data transfer streams) overlaps data transfer and computation to reduce the overall execution time compared to sequential offloading. This is enabled using layer prefetching with CUDA streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed - this increases the memory requirements slightly. Group offloading also supports leaf-level offloading (equivalent to sequential CPU offloading) but can be made much faster when using streams.
200
+
161
201
## FP8 layerwise weight-casting
162
202
163
203
PyTorch supports `torch.float8_e4m3fn` and `torch.float8_e5m2` as weight storage dtypes, but they can't be used for computation in many different tensor operations due to unimplemented kernel support. However, you can use these dtypes to store model weights in fp8 precision and upcast them on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting.
0 commit comments