You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/api/image_processor.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,12 @@ All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or Nu
20
20
21
21
[[autodoc]] image_processor.VaeImageProcessor
22
22
23
+
## InpaintProcessor
24
+
25
+
The [`InpaintProcessor`] accepts `mask` and `image` inputs and process them together. Optionally, it can accept padding_mask_crop and apply mask overlay.
26
+
27
+
[[autodoc]] image_processor.InpaintProcessor
28
+
23
29
## VaeImageProcessorLDM3D
24
30
25
31
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
Copy file name to clipboardExpand all lines: docs/source/en/optimization/memory.md
+46-3Lines changed: 46 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -291,13 +291,53 @@ Group offloading moves groups of internal layers ([torch.nn.ModuleList](https://
291
291
> [!WARNING]
292
292
> Group offloading may not work with all models if the forward implementation contains weight-dependent device casting of inputs because it may clash with group offloading's device casting mechanism.
293
293
294
-
Call [`~ModelMixin.enable_group_offload`] to enable it for standard Diffusers model components that inherit from [`ModelMixin`]. For other model components that don't inherit from [`ModelMixin`], such as a generic [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html), use [`~hooks.apply_group_offloading`] instead.
295
-
296
-
The `offload_type` parameter can be set to `block_level` or `leaf_level`.
294
+
Enable group offloading by configuring the `offload_type` parameter to `block_level` or `leaf_level`.
297
295
298
296
-`block_level` offloads groups of layers based on the `num_blocks_per_group` parameter. For example, if `num_blocks_per_group=2` on a model with 40 layers, 2 layers are onloaded and offloaded at a time (20 total onloads/offloads). This drastically reduces memory requirements.
299
297
-`leaf_level` offloads individual layers at the lowest level and is equivalent to [CPU offloading](#cpu-offloading). But it can be made faster if you use streams without giving up inference speed.
300
298
299
+
Group offloading is supported for entire pipelines or individual models. Applying group offloading to the entire pipeline is the easiest option while selectively applying it to individual models gives users more flexibility to use different offloading techniques for different models.
300
+
301
+
<hfoptionsid="group-offloading">
302
+
<hfoptionid="pipeline">
303
+
304
+
Call [`~DiffusionPipeline.enable_group_offload`] on a pipeline.
305
+
306
+
```py
307
+
import torch
308
+
from diffusers import CogVideoXPipeline
309
+
from diffusers.hooks import apply_group_offloading
Call [`~ModelMixin.enable_group_offload`] on standard Diffusers model components that inherit from [`ModelMixin`]. For other model components that don't inherit from [`ModelMixin`], such as a generic [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html), use [`~hooks.apply_group_offloading`] instead.
The `use_stream` parameter can be activated for CUDA devices that support asynchronous data transfer streams to reduce overall execution time compared to [CPU offloading](#cpu-offloading). It overlaps data transfer and computation by using layer prefetching. The next layer to be executed is loaded onto the GPU while the current layer is still being executed. It can increase CPU memory significantly so ensure you have 2x the amount of memory as the model size.
Copy file name to clipboardExpand all lines: docs/source/en/quantization/overview.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,9 @@ Initialize [`~quantizers.PipelineQuantizationConfig`] with the following paramet
34
34
> [!TIP]
35
35
> These `quant_kwargs` arguments are different for each backend. Refer to the [Quantization API](../api/quantization) docs to view the arguments for each backend.
36
36
37
-
-`components_to_quantize` specifies which components of the pipeline to quantize. Typically, you should quantize the most compute intensive components like the transformer. The text encoder is another component to consider quantizing if a pipeline has more than one such as [`FluxPipeline`]. The example below quantizes the T5 text encoder in [`FluxPipeline`] while keeping the CLIP model intact.
37
+
-`components_to_quantize` specifies which component(s) of the pipeline to quantize. Typically, you should quantize the most compute intensive components like the transformer. The text encoder is another component to consider quantizing if a pipeline has more than one such as [`FluxPipeline`]. The example below quantizes the T5 text encoder in [`FluxPipeline`] while keeping the CLIP model intact.
38
+
39
+
`components_to_quantize` accepts either a list for multiple models or a string for a single model.
38
40
39
41
The example below loads the bitsandbytes backend with the following arguments from [`~quantizers.quantization_config.BitsAndBytesConfig`], `load_in_4bit`, `bnb_4bit_quant_type`, and `bnb_4bit_compute_dtype`.
The `quant_mapping` argument provides more options for how to quantize each individual component in a pipeline, like combining different quantization backends.
Copy file name to clipboardExpand all lines: docs/source/en/using-diffusers/image_quality.md
+2-8Lines changed: 2 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,13 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
10
10
specific language governing permissions and limitations under the License.
11
11
-->
12
12
13
-
# Controlling image quality
14
-
15
-
The components of a diffusion model, like the UNet and scheduler, can be optimized to improve the quality of generated images leading to better details. These techniques are especially useful if you don't have the resources to simply use a larger model for inference. You can enable these techniques during inference without any additional training.
16
-
17
-
This guide will show you how to turn these techniques on in your pipeline and how to configure them to improve the quality of your generated images.
18
-
19
-
## Details
13
+
# FreeU
20
14
21
15
[FreeU](https://hf.co/papers/2309.11497) improves image details by rebalancing the UNet's backbone and skip connection weights. The skip connections can cause the model to overlook some of the backbone semantics which may lead to unnatural image details in the generated image. This technique does not require any additional training and can be applied on the fly during inference for tasks like image-to-image and text-to-video.
The `device_map` argument determines individual model or pipeline placement on an accelerator like a GPU. It is especially helpful when there are multiple GPUs.
110
110
111
-
Diffusers currently provides three options to`device_map`, `"cuda"`, `"balanced"`and `"auto"`. Refer to the table below to compare the three placement strategies.
111
+
A pipeline supports two options for`device_map`, `"cuda"`and `"balanced"`. Refer to the table below to compare the placement strategies.
112
112
113
113
| parameter | description |
114
114
|---|---|
115
-
|`"cuda"`| places model or pipeline on CUDA device |
116
-
|`"balanced"`| evenly distributes model or pipeline on all GPUs |
117
-
|`"auto"`| distribute model from fastest device first to slowest |
115
+
|`"cuda"`| places pipeline on a supported accelerator device like CUDA |
116
+
|`"balanced"`| evenly distributes pipeline on all GPUs |
118
117
119
118
Use the `max_memory` argument in [`~DiffusionPipeline.from_pretrained`] to allocate a maximum amount of memory to use on each device. By default, Diffusers uses the maximum amount available.
120
119
121
-
<hfoptionsid="device_map">
122
-
<hfoptionid="pipeline">
123
-
124
120
```py
125
121
import torch
126
122
from diffusers import DiffusionPipeline
127
123
124
+
max_memory = {0: "16GB", 1: "16GB"}
128
125
pipeline = DiffusionPipeline.from_pretrained(
129
126
"Qwen/Qwen-Image",
130
127
torch_dtype=torch.bfloat16,
131
128
device_map="cuda",
132
129
)
133
130
```
134
131
135
-
</hfoption>
136
-
<hfoptionid="individual model">
137
-
138
-
```py
139
-
import torch
140
-
from diffusers import AutoModel
141
-
142
-
max_memory = {0: "16GB", 1: "16GB"}
143
-
transformer = AutoModel.from_pretrained(
144
-
"Qwen/Qwen-Image",
145
-
subfolder="transformer",
146
-
torch_dtype=torch.bfloat16
147
-
device_map="cuda",
148
-
max_memory=max_memory
149
-
)
150
-
```
151
-
152
-
</hfoption>
153
-
</hfoptions>
154
-
155
132
The `hf_device_map` attribute allows you to access and view the `device_map`.
[`DiffusionPipeline`] is flexible and accommodates loading different models or schedulers. You can experiment with different schedulers to optimize for generation speed or quality, and you can replace models with more performant ones.
191
168
192
-
The example below swaps the default scheduler to generate higher quality images and a more stable VAE version. Pass the `subfolder` argument in [`~HeunDiscreteScheduler.from_pretrained`] to load the scheduler to the correct subfolder.
169
+
The example below uses a more stable VAE version.
193
170
194
171
```py
195
172
import torch
196
-
from diffusers import DiffusionPipeline, HeunDiscreteScheduler, AutoModel
173
+
from diffusers import DiffusionPipeline, AutoModel
0 commit comments