You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The [`~QwenImagePipeline.from_pretrained`] method won't download files from the Hub when it detects a local path. But this also means it won't download and cache any updates that have been made to the model.
76
+
The [`~QwenImagePipeline.from_pretrained`] method won't download files from the Hub when it detects a local path. But this also means it won't download and cache any updates that have been made to the model either.
76
77
77
78
## Pipeline data types
78
79
@@ -82,10 +83,10 @@ Pass the data type for each model as a dictionary to `torch_dtype`. Use the `def
@@ -111,9 +112,9 @@ Diffusers currently provides three options to `device_map`, `"cuda"`, `"balanced
111
112
112
113
| parameter | description |
113
114
|---|---|
114
-
|`"cuda"`| places model on CUDA device |
115
+
|`"cuda"`| places model or pipeline on CUDA device |
115
116
|`"balanced"`| evenly distributes model or pipeline on all GPUs |
116
-
|`"auto"`| distribute model or pipeline from fastest device first to slowest |
117
+
|`"auto"`| distribute model from fastest device first to slowest |
117
118
118
119
Use the `max_memory` argument in [`~DiffusionPipeline.from_pretrained`] to allocate a maximum amount of memory to use on each device. By default, Diffusers uses the maximum amount available.
Large models are often [sharded](../training/distributed_inference#model-sharding) into smaller files so that they are easier to load. Diffusers supports loading shards in parallel to speed up the loading process.
170
171
171
-
Set the environment variables below to enable parallel loading.
172
-
173
-
- Set `HF_ENABLE_PARALLEL_LOADING` to `"YES"` to enable parallel loading of shards.
174
-
- Set `HF_PARALLEL_LOADING_WORKERS` to configure the number of parallel threads to use when loading shards. More workers loads a model faster but uses more memory.
172
+
Set `HF_ENABLE_PARALLEL_LOADING` to `"YES"` to enable parallel loading of shards.
175
173
176
174
The `device_map` argument should be set to `"cuda"` to pre-allocate a large chunk of memory based on the model size. This substantially reduces model load time because warming up the memory allocator now avoids many smaller calls to the allocator later.
177
175
@@ -222,7 +220,7 @@ Memory usage is determined by the pipeline with the highest memory requirement r
222
220
The example below loads a pipeline and then loads a second pipeline with [`~DiffusionPipeline.from_pipe`] to use [perturbed-attention guidance (PAG)](../api/pipelines/pag) to improve generation quality.
223
221
224
222
> [!WARNING]
225
-
> Use [`AutoPipelineForText2Image`]instead because [`DiffusionPipeline`] doesn't support PAG. Refer to the [AutoPipeline](../tutorials/autopipeline) docs to learn more.
223
+
> Use [`AutoPipelineForText2Image`] because [`DiffusionPipeline`] doesn't support PAG. Refer to the [AutoPipeline](../tutorials/autopipeline) docs to learn more.
226
224
227
225
```py
228
226
import torch
@@ -264,7 +262,7 @@ Some methods may not work correctly on pipelines created with [`~DiffusionPipeli
264
262
265
263
Diffusers provides a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) for older Stable Diffusion models to prevent generating harmful content. It screens the generated output against a set of hardcoded harmful concepts.
266
264
267
-
If you want to disable the safety checker, pass `safety_checker=None` in [`!DiffusionPipeline.from_pretrained`] as shown below.
265
+
If you want to disable the safety checker, pass `safety_checker=None` in [`~DiffusionPipeline.from_pretrained`] as shown below.
0 commit comments