Skip to content

Commit e06b21f

Browse files
committed
feedback
1 parent b0dd3b7 commit e06b21f

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

docs/source/en/using-diffusers/loading.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)
112112

113113
If a component is not explicitly specified in the dictionary and no `default` is provided, it will be loaded with `torch.float32`.
114114

115-
#### Parallel loading
115+
### Parallel loading
116116

117117
Large models are often [sharded](../training/distributed_inference#model-sharding) into smaller files so that they are easier to load. Diffusers supports loading shards in parallel to speed up the loading process.
118118

@@ -121,6 +121,8 @@ Set the environment variables below to enable parallel loading.
121121
- Set `HF_ENABLE_PARALLEL_LOADING` to `"YES"` to enable parallel loading of shards.
122122
- Set `HF_PARALLEL_LOADING_WORKERS` to configure the number of parallel threads to use when loading shards. More workers loads a model faster but uses more memory.
123123

124+
The `device_map` argument should be set to `"cuda"` to pre-allocate a large chunk of memory based on the model size. This substantially reduces model load time because warming up the memory allocator now avoids many smaller calls to the allocator later.
125+
124126
```py
125127
import os
126128
import torch

0 commit comments

Comments
 (0)