|  | 
|  | 1 | +<!--Copyright 2025 The HuggingFace Team. All rights reserved. | 
|  | 2 | +
 | 
|  | 3 | +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | 
|  | 4 | +the License. You may obtain a copy of the License at | 
|  | 5 | +
 | 
|  | 6 | +http://www.apache.org/licenses/LICENSE-2.0 | 
|  | 7 | +
 | 
|  | 8 | +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | 
|  | 9 | +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | 
|  | 10 | +specific language governing permissions and limitations under the License. | 
|  | 11 | +--> | 
|  | 12 | + | 
|  | 13 | +[[open-in-colab]] | 
|  | 14 | + | 
|  | 15 | +# Models | 
|  | 16 | + | 
|  | 17 | +A diffusion model relies on a few individual models working together to generate an output. These models are responsible for denoising, encoding inputs, and decoding latents into the actual outputs. | 
|  | 18 | + | 
|  | 19 | +This guide will show you how to load models. | 
|  | 20 | + | 
|  | 21 | +## Loading a model | 
|  | 22 | + | 
|  | 23 | +All models are loaded with the [`~ModelMixin.from_pretrained`] method, which downloads and caches the latest model version. If the latest files are available in the local cache, [`~ModelMixin.from_pretrained`] reuses files in the cache. | 
|  | 24 | + | 
|  | 25 | +Pass the `subfolder` argument to [`~ModelMixin.from_pretrained`] to specify where to load the model weights from. Omit the `subfolder` argument if the repository doesn't have a subfolder structure or if you're loading a standalone model. | 
|  | 26 | + | 
|  | 27 | +```py | 
|  | 28 | +from diffusers import QwenImageTransformer2DModel | 
|  | 29 | + | 
|  | 30 | +model = QwenImageTransformer2DModel.from_pretrained("Qwen/Qwen-Image", subfolder="transformer") | 
|  | 31 | +``` | 
|  | 32 | + | 
|  | 33 | +## AutoModel | 
|  | 34 | + | 
|  | 35 | +[`AutoModel`] detects the model class from a `model_index.json` file or a model's `config.json` file. It fetches the correct model class from these files and delegates the actual loading to the model class. [`AutoModel`] is useful for automatic model type detection without needing to know the exact model class beforehand. | 
|  | 36 | + | 
|  | 37 | +```py | 
|  | 38 | +from diffusers import AutoModel | 
|  | 39 | + | 
|  | 40 | +model = AutoModel.from_pretrained( | 
|  | 41 | +    "Qwen/Qwen-Image", subfolder="transformer" | 
|  | 42 | +) | 
|  | 43 | +``` | 
|  | 44 | + | 
|  | 45 | +## Model data types | 
|  | 46 | + | 
|  | 47 | +Use the `torch_dtype` argument in [`~ModelMixin.from_pretrained`] to load a model with a specific data type. This allows you to load a model in a lower precision to reduce memory usage. | 
|  | 48 | + | 
|  | 49 | +```py | 
|  | 50 | +import torch | 
|  | 51 | +from diffusers import QwenImageTransformer2DModel | 
|  | 52 | + | 
|  | 53 | +model = QwenImageTransformer2DModel.from_pretrained( | 
|  | 54 | +    "Qwen/Qwen-Image", | 
|  | 55 | +    subfolder="transformer", | 
|  | 56 | +    torch_dtype=torch.bfloat16 | 
|  | 57 | +) | 
|  | 58 | +``` | 
|  | 59 | + | 
|  | 60 | +[nn.Module.to](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to) can also convert to a specific data type on the fly. However, it converts *all* weights to the requested data type unlike `torch_dtype` which respects `_keep_in_fp32_modules`. This argument preserves layers in `torch.float32` for numerical stability and best generation quality (see example [_keep_in_fp32_modules](https://github.com/huggingface/diffusers/blob/f864a9a352fa4a220d860bfdd1782e3e5af96382/src/diffusers/models/transformers/transformer_wan.py#L374)) | 
|  | 61 | + | 
|  | 62 | +```py | 
|  | 63 | +from diffusers import QwenImageTransformer2DModel | 
|  | 64 | + | 
|  | 65 | +model = QwenImageTransformer2DModel.from_pretrained( | 
|  | 66 | +    "Qwen/Qwen-Image", subfolder="transformer" | 
|  | 67 | +) | 
|  | 68 | +model = model.to(dtype=torch.float16)  | 
|  | 69 | +``` | 
|  | 70 | + | 
|  | 71 | +## Device placement | 
|  | 72 | + | 
|  | 73 | +Use the `device_map` argument in [`~ModelMixin.from_pretrained`] to place a model on an accelerator like a GPU. It is especially helpful where there are multiple GPUs. | 
|  | 74 | + | 
|  | 75 | +Diffusers currently provides three options to `device_map` for individual models, `"cuda"`, `"balanced"` and `"auto"`. Refer to the table below to compare the three placement strategies. | 
|  | 76 | + | 
|  | 77 | +| parameter | description | | 
|  | 78 | +|---|---| | 
|  | 79 | +| `"cuda"` | places pipeline on a supported accelerator (CUDA) | | 
|  | 80 | +| `"balanced"` | evenly distributes pipeline on all GPUs | | 
|  | 81 | +| `"auto"` | distribute model from fastest device first to slowest | | 
|  | 82 | + | 
|  | 83 | +Use the `max_memory` argument in [`~ModelMixin.from_pretrained`] to allocate a maximum amount of memory to use on each device. By default, Diffusers uses the maximum amount available. | 
|  | 84 | + | 
|  | 85 | +```py | 
|  | 86 | +import torch | 
|  | 87 | +from diffusers import QwenImagePipeline | 
|  | 88 | + | 
|  | 89 | +max_memory = {0: "16GB", 1: "16GB"} | 
|  | 90 | +pipeline = QwenImagePipeline.from_pretrained( | 
|  | 91 | +    "Qwen/Qwen-Image",  | 
|  | 92 | +    torch_dtype=torch.bfloat16, | 
|  | 93 | +    device_map="cuda", | 
|  | 94 | +    max_memory=max_memory | 
|  | 95 | +) | 
|  | 96 | +``` | 
|  | 97 | + | 
|  | 98 | +The `hf_device_map` attribute allows you to access and view the `device_map`. | 
|  | 99 | + | 
|  | 100 | +```py | 
|  | 101 | +print(transformer.hf_device_map) | 
|  | 102 | +# {'': device(type='cuda')} | 
|  | 103 | +``` | 
|  | 104 | + | 
|  | 105 | +## Saving models | 
|  | 106 | + | 
|  | 107 | +Save a model with the [`~ModelMixin.save_pretrained`] method. | 
|  | 108 | + | 
|  | 109 | +```py | 
|  | 110 | +from diffusers import QwenImageTransformer2DModel | 
|  | 111 | + | 
|  | 112 | +model = QwenImageTransformer2DModel.from_pretrained("Qwen/Qwen-Image", subfolder="transformer") | 
|  | 113 | +model.save_pretrained("./local/model") | 
|  | 114 | +``` | 
|  | 115 | + | 
|  | 116 | +For large models, it is helpful to use `max_shard_size` to save a model as multiple shards. A shard can be loaded faster and save memory (refer to the [parallel loading](./loading#parallel-loading) docs for more details), especially if there is more than one GPU. | 
|  | 117 | + | 
|  | 118 | +```py | 
|  | 119 | +model.save_pretrained("./local/model", max_shard_size="5GB") | 
|  | 120 | +``` | 
0 commit comments