Skip to content

Commit 1a36051

Browse files
committed
refresh
1 parent 76c28bf commit 1a36051

File tree

1 file changed

+93
-21
lines changed

1 file changed

+93
-21
lines changed

docs/source/en/using-diffusers/loading.md

Lines changed: 93 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,8 @@ import torch
2929
from diffusers import DiffusionPipeline
3030

3131
pipeline = DiffusionPipeline.from_pretrained(
32-
"Qwen/Qwen-Image",
33-
torch_dtype=torch.bfloat16
34-
).to("cuda")
32+
"Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map="cuda"
33+
)
3534
```
3635

3736
Every model has a specific pipeline subclass that inherits from [`DiffusionPipeline`]. A subclass usually has a narrow focus and are task-specific. See the table below for an example.
@@ -49,9 +48,8 @@ import torch
4948
from diffusers import QwenImagePipeline
5049

5150
pipeline = QwenImagePipeline.from_pretrained(
52-
"Qwen/Qwen-Image",
53-
torch_dtype=torch.bfloat16
54-
).to("cuda")
51+
"Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map="cuda"
52+
)
5553
```
5654

5755
### Local pipelines
@@ -70,9 +68,8 @@ import torch
7068
from diffusers import QwenImagePipeline
7169

7270
pipeline = QwenImagePipeline.from_pretrained(
73-
"path/to/local/Qwen/Qwen-Image",
74-
torch_dtype=torch.bfloat16
75-
).to("cuda")
71+
"path/to/local/Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map="cuda"
72+
)
7673
```
7774

7875
The [`~QwenImagePipeline.from_pretrained`] method won't download files from the Hub when it detects a local path. But this also means it won't download and cache any updates that have been made to the model.
@@ -88,8 +85,8 @@ import torch
8885
from diffusers import HunyuanVideoPipeline
8986

9087
pipeline = HunyuanVideoPipeline.from_pretrained(
91-
"hunyuanvideo-community/HunyuanVideo",
92-
torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
88+
"hunyuanvideo-community/HunyuanVideo",
89+
torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
9390
)
9491
print(pipeline.transformer.dtype, pipeline.vae.dtype)
9592
```
@@ -101,12 +98,72 @@ import torch
10198
from diffusers import HunyuanVideoPipeline
10299

103100
pipeline = HunyuanVideoPipeline.from_pretrained(
104-
"hunyuanvideo-community/HunyuanVideo",
105-
torch_dtype=torch.bfloat16
101+
"hunyuanvideo-community/HunyuanVideo", torch_dtype=torch.bfloat16
106102
)
107103
print(pipeline.transformer.dtype, pipeline.vae.dtype)
108104
```
109105

106+
## Device placement
107+
108+
The `device_map` argument determines individual model or pipeline placement on an accelerator like a GPU. It is especially helpful when there are multiple GPUs.
109+
110+
Diffusers currently provides three options to `device_map`, `"cuda"`, `"balanced"` and `"auto"`. Refer to the table below to compare the three placement strategies.
111+
112+
| parameter | description |
113+
|---|---|
114+
| `"cuda"` | places model on CUDA device |
115+
| `"balanced"` | evenly distributes model or pipeline on all GPUs |
116+
| `"auto"` | distribute model or pipeline from fastest device first to slowest |
117+
118+
Use the `max_memory` argument in [`~DiffusionPipeline.from_pretrained`] to allocate a maximum amount of memory to use on each device. By default, Diffusers uses the maximum amount available.
119+
120+
<hfoptions id="device_map">
121+
<hfoption id="pipeline">
122+
123+
```py
124+
import torch
125+
from diffusers import DiffusionPipeline
126+
127+
pipeline = DiffusionPipeline.from_pretrained(
128+
"black-forest-labs/FLUX.1-dev",
129+
torch_dtype=torch.bfloat16,
130+
device_map="cuda",
131+
)
132+
```
133+
134+
</hfoption>
135+
<hfoption id="individual model">
136+
137+
```py
138+
import torch
139+
from diffusers import DiffusionPipeline, AutoModel
140+
141+
max_memory = {0: "16GB", 1: "16GB"}
142+
transformer = AutoModel.from_pretrained(
143+
"black-forest-labs/FLUX.1-dev",
144+
subfolder="transformer",
145+
torch_dtype=torch.bfloat16
146+
device_map="cuda",
147+
max_memory=max_memory
148+
)
149+
```
150+
151+
</hfoption>
152+
</hfoptions>
153+
154+
The `hf_device_map` attribute allows you to access and view the `device_map`.
155+
156+
```py
157+
print(pipeline.hf_device_map)
158+
# {'unet': 1, 'vae': 1, 'safety_checker': 0, 'text_encoder': 0}
159+
```
160+
161+
Reset a pipeline's `device_map` with the [`~DiffusionPipeline.reset_device_map`] method. This is necessary if you want to use methods such as `.to()`, [`~DiffusionPipeline.enable_sequential_cpu_offload`], and [`~DiffusionPipeline.enable_model_cpu_offload`].
162+
163+
```py
164+
pipeline.reset_device_map()
165+
```
166+
110167
## Parallel loading
111168

112169
Large models are often [sharded](../training/distributed_inference#model-sharding) into smaller files so that they are easier to load. Diffusers supports loading shards in parallel to speed up the loading process.
@@ -124,12 +181,9 @@ import torch
124181
from diffusers import DiffusionPipeline
125182

126183
os.environ["HF_ENABLE_PARALLEL_LOADING"] = "YES"
127-
os.environ["HF_PARALLEL_LOADING_WORKERS"] = "12"
128184

129185
pipeline = DiffusionPipeline.from_pretrained(
130-
"Wan-AI/Wan2.2-I2V-A14B-Diffusers",
131-
torch_dtype=torch.bfloat16,
132-
device_map="cuda"
186+
"Wan-AI/Wan2.2-I2V-A14B-Diffusers", torch_dtype=torch.bfloat16, device_map="cuda"
133187
)
134188
```
135189

@@ -155,7 +209,8 @@ pipeline = DiffusionPipeline.from_pretrained(
155209
scheduler=scheduler,
156210
vae=vae,
157211
torch_dtype=torch.float16,
158-
).to("cuda")
212+
device_map="cuda"
213+
)
159214
```
160215

161216
## Reusing models in multiple pipelines
@@ -174,8 +229,8 @@ import torch
174229
from diffusers import AutoPipelineForText2Image
175230

176231
pipeline_sdxl = AutoPipelineForText2Image.from_pretrained(
177-
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
178-
).to("cuda")
232+
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, device_map="cuda"
233+
)
179234
prompt = """
180235
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
181236
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
@@ -203,4 +258,21 @@ print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} G
203258
> [!WARNING]
204259
> Pipelines created by [`~DiffusionPipeline.from_pipe`] share the same models and *state*. Modifying the state of a model in one pipeline affects all the other pipelines that share the same model.
205260
206-
Some methods may not work correctly on pipelines created with [`~DiffusionPipeline.from_pipe`]. For example, [`~DiffusionPipeline.enable_model_cpu_offload`] relies on a unique model execution order, which may differ in the new pipeline. To ensure proper functionality, reapply these methods on the new pipeline.
261+
Some methods may not work correctly on pipelines created with [`~DiffusionPipeline.from_pipe`]. For example, [`~DiffusionPipeline.enable_model_cpu_offload`] relies on a unique model execution order, which may differ in the new pipeline. To ensure proper functionality, reapply these methods on the new pipeline.
262+
263+
## Safety checker
264+
265+
Diffusers provides a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) for older Stable Diffusion models to prevent generating harmful content. It screens the generated output against a set of hardcoded harmful concepts.
266+
267+
If you want to disable the safety checker, pass `safety_checker=None` in [`!DiffusionPipeline.from_pretrained`] as shown below.
268+
269+
```py
270+
from diffusers import DiffusionPipeline
271+
272+
pipeline = DiffusionPipeline.from_pretrained(
273+
"stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None
274+
)
275+
"""
276+
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide by the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend keeping the safety filter enabled in all public-facing circumstances, disabling it only for use cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
277+
"""
278+
```

0 commit comments

Comments
 (0)