Skip to content

Commit e187e23

Browse files
authored
Merge branch 'main' into enable-telemetry-quant-single-file
2 parents 59f4531 + b316104 commit e187e23

File tree

111 files changed

+5580
-210
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+5580
-210
lines changed

docs/source/en/_toctree.yml

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@
175175
title: gguf
176176
- local: quantization/torchao
177177
title: torchao
178-
- local: quantization/quanto
178+
- local: quantization/quanto
179179
title: quanto
180180
title: Quantization Methods
181181
- sections:
@@ -270,16 +270,18 @@
270270
- sections:
271271
- local: api/models/controlnet
272272
title: ControlNetModel
273+
- local: api/models/controlnet_union
274+
title: ControlNetUnionModel
273275
- local: api/models/controlnet_flux
274276
title: FluxControlNetModel
275277
- local: api/models/controlnet_hunyuandit
276278
title: HunyuanDiT2DControlNetModel
279+
- local: api/models/controlnet_sana
280+
title: SanaControlNetModel
277281
- local: api/models/controlnet_sd3
278282
title: SD3ControlNetModel
279283
- local: api/models/controlnet_sparsectrl
280284
title: SparseControlNetModel
281-
- local: api/models/controlnet_union
282-
title: ControlNetUnionModel
283285
title: ControlNets
284286
- sections:
285287
- local: api/models/allegro_transformer3d
@@ -300,6 +302,8 @@
300302
title: EasyAnimateTransformer3DModel
301303
- local: api/models/flux_transformer
302304
title: FluxTransformer2DModel
305+
- local: api/models/hidream_image_transformer
306+
title: HiDreamImageTransformer2DModel
303307
- local: api/models/hunyuan_transformer2d
304308
title: HunyuanDiT2DModel
305309
- local: api/models/hunyuan_video_transformer_3d
@@ -422,6 +426,8 @@
422426
title: ControlNet with Stable Diffusion 3
423427
- local: api/pipelines/controlnet_sdxl
424428
title: ControlNet with Stable Diffusion XL
429+
- local: api/pipelines/controlnet_sana
430+
title: ControlNet-Sana
425431
- local: api/pipelines/controlnetxs
426432
title: ControlNet-XS
427433
- local: api/pipelines/controlnetxs_sdxl
@@ -446,6 +452,8 @@
446452
title: Flux
447453
- local: api/pipelines/control_flux_inpaint
448454
title: FluxControlInpaint
455+
- local: api/pipelines/hidream
456+
title: HiDream-I1
449457
- local: api/pipelines/hunyuandit
450458
title: Hunyuan-DiT
451459
- local: api/pipelines/hunyuan_video

docs/source/en/api/loaders/lora.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
2020
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
2121
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
2222
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
23+
- [`AuraFlowLoraLoaderMixin`] provides similar functions for [AuraFlow](https://huggingface.co/fal/AuraFlow).
2324
- [`LTXVideoLoraLoaderMixin`] provides similar functions for [LTX-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
2425
- [`SanaLoraLoaderMixin`] provides similar functions for [Sana](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana).
2526
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
@@ -56,6 +57,9 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
5657
## Mochi1LoraLoaderMixin
5758

5859
[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
60+
## AuraFlowLoraLoaderMixin
61+
62+
[[autodoc]] loaders.lora_pipeline.AuraFlowLoraLoaderMixin
5963

6064
## LTXVideoLoraLoaderMixin
6165

docs/source/en/api/models/autoencoderkl_allegro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import AutoencoderKLAllegro
2020

21-
vae = AutoencoderKLCogVideoX.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
21+
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
2222
```
2323

2424
## AutoencoderKLAllegro
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# SanaControlNetModel
14+
15+
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
16+
17+
The abstract from the paper is:
18+
19+
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
20+
21+
This model was contributed by [ishan24](https://huggingface.co/ishan24). ❤️
22+
The original codebase can be found at [NVlabs/Sana](https://github.com/NVlabs/Sana), and you can find official ControlNet checkpoints on [Efficient-Large-Model's](https://huggingface.co/Efficient-Large-Model) Hub profile.
23+
24+
## SanaControlNetModel
25+
[[autodoc]] SanaControlNetModel
26+
27+
## SanaControlNetOutput
28+
[[autodoc]] models.controlnets.controlnet_sana.SanaControlNetOutput
29+
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HiDreamImageTransformer2DModel
13+
14+
A Transformer model for image-like data from [HiDream-I1](https://huggingface.co/HiDream-ai).
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HiDreamImageTransformer2DModel
20+
21+
transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HiDreamImageTransformer2DModel
25+
26+
[[autodoc]] HiDreamImageTransformer2DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ControlNet
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
</div>
18+
19+
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
20+
21+
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
22+
23+
The abstract from the paper is:
24+
25+
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
26+
27+
This pipeline was contributed by [ishan24](https://huggingface.co/ishan24). ❤️
28+
The original codebase can be found at [NVlabs/Sana](https://github.com/NVlabs/Sana), and you can find official ControlNet checkpoints on [Efficient-Large-Model's](https://huggingface.co/Efficient-Large-Model) Hub profile.
29+
30+
## SanaControlNetPipeline
31+
[[autodoc]] SanaControlNetPipeline
32+
- all
33+
- __call__
34+
35+
## SanaPipelineOutput
36+
[[autodoc]] pipelines.sana.pipeline_output.SanaPipelineOutput
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# HiDreamImage
16+
17+
[HiDream-I1](https://huggingface.co/HiDream-ai) by HiDream.ai
18+
19+
<Tip>
20+
21+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
22+
23+
</Tip>
24+
25+
## Available models
26+
27+
The following models are available for the [`HiDreamImagePipeline`](text-to-image) pipeline:
28+
29+
| Model name | Description |
30+
|:---|:---|
31+
| [`HiDream-ai/HiDream-I1-Full`](https://huggingface.co/HiDream-ai/HiDream-I1-Full) | - |
32+
| [`HiDream-ai/HiDream-I1-Dev`](https://huggingface.co/HiDream-ai/HiDream-I1-Dev) | - |
33+
| [`HiDream-ai/HiDream-I1-Fast`](https://huggingface.co/HiDream-ai/HiDream-I1-Fast) | - |
34+
35+
## HiDreamImagePipeline
36+
37+
[[autodoc]] HiDreamImagePipeline
38+
- all
39+
- __call__
40+
41+
## HiDreamImagePipelineOutput
42+
43+
[[autodoc]] pipelines.hidream_image.pipeline_output.HiDreamImagePipelineOutput

docs/source/en/community_projects.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,4 +83,8 @@ Happy exploring, and thank you for being part of the Diffusers community!
8383
<td><a href="https://github.com/suzukimain/auto_diffusers"> Model Search </a></td>
8484
<td>Search models on Civitai and Hugging Face</td>
8585
</tr>
86+
<tr style="border-top: 2px solid black">
87+
<td><a href="https://github.com/beinsezii/skrample"> Skrample </a></td>
88+
<td>Fully modular scheduler functions with 1st class diffusers integration.</td>
89+
</tr>
8690
</table>

docs/source/en/quantization/bitsandbytes.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
4949
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
5050
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
5151

52-
from diffusers import FluxTransformer2DModel
52+
from diffusers import AutoModel
5353
from transformers import T5EncoderModel
5454

5555
quant_config = TransformersBitsAndBytesConfig(load_in_8bit=True,)
@@ -63,7 +63,7 @@ text_encoder_2_8bit = T5EncoderModel.from_pretrained(
6363

6464
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True,)
6565

66-
transformer_8bit = FluxTransformer2DModel.from_pretrained(
66+
transformer_8bit = AutoModel.from_pretrained(
6767
"black-forest-labs/FLUX.1-dev",
6868
subfolder="transformer",
6969
quantization_config=quant_config,
@@ -74,7 +74,7 @@ transformer_8bit = FluxTransformer2DModel.from_pretrained(
7474
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter.
7575

7676
```diff
77-
transformer_8bit = FluxTransformer2DModel.from_pretrained(
77+
transformer_8bit = AutoModel.from_pretrained(
7878
"black-forest-labs/FLUX.1-dev",
7979
subfolder="transformer",
8080
quantization_config=quant_config,
@@ -133,7 +133,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
133133
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
134134
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
135135

136-
from diffusers import FluxTransformer2DModel
136+
from diffusers import AutoModel
137137
from transformers import T5EncoderModel
138138

139139
quant_config = TransformersBitsAndBytesConfig(load_in_4bit=True,)
@@ -147,7 +147,7 @@ text_encoder_2_4bit = T5EncoderModel.from_pretrained(
147147

148148
quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True,)
149149

150-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
150+
transformer_4bit = AutoModel.from_pretrained(
151151
"black-forest-labs/FLUX.1-dev",
152152
subfolder="transformer",
153153
quantization_config=quant_config,
@@ -158,7 +158,7 @@ transformer_4bit = FluxTransformer2DModel.from_pretrained(
158158
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter.
159159

160160
```diff
161-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
161+
transformer_4bit = AutoModel.from_pretrained(
162162
"black-forest-labs/FLUX.1-dev",
163163
subfolder="transformer",
164164
quantization_config=quant_config,
@@ -217,11 +217,11 @@ print(model.get_memory_footprint())
217217
Quantized models can be loaded from the [`~ModelMixin.from_pretrained`] method without needing to specify the `quantization_config` parameters:
218218

219219
```py
220-
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
220+
from diffusers import AutoModel, BitsAndBytesConfig
221221

222222
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
223223

224-
model_4bit = FluxTransformer2DModel.from_pretrained(
224+
model_4bit = AutoModel.from_pretrained(
225225
"hf-internal-testing/flux.1-dev-nf4-pkg", subfolder="transformer"
226226
)
227227
```
@@ -243,13 +243,13 @@ An "outlier" is a hidden state value greater than a certain threshold, and these
243243
To find the best threshold for your model, we recommend experimenting with the `llm_int8_threshold` parameter in [`BitsAndBytesConfig`]:
244244

245245
```py
246-
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
246+
from diffusers import AutoModel, BitsAndBytesConfig
247247

248248
quantization_config = BitsAndBytesConfig(
249249
load_in_8bit=True, llm_int8_threshold=10,
250250
)
251251

252-
model_8bit = FluxTransformer2DModel.from_pretrained(
252+
model_8bit = AutoModel.from_pretrained(
253253
"black-forest-labs/FLUX.1-dev",
254254
subfolder="transformer",
255255
quantization_config=quantization_config,
@@ -305,7 +305,7 @@ NF4 is a 4-bit data type from the [QLoRA](https://hf.co/papers/2305.14314) paper
305305
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
306306
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
307307

308-
from diffusers import FluxTransformer2DModel
308+
from diffusers import AutoModel
309309
from transformers import T5EncoderModel
310310

311311
quant_config = TransformersBitsAndBytesConfig(
@@ -325,7 +325,7 @@ quant_config = DiffusersBitsAndBytesConfig(
325325
bnb_4bit_quant_type="nf4",
326326
)
327327

328-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
328+
transformer_4bit = AutoModel.from_pretrained(
329329
"black-forest-labs/FLUX.1-dev",
330330
subfolder="transformer",
331331
quantization_config=quant_config,
@@ -343,7 +343,7 @@ Nested quantization is a technique that can save additional memory at no additio
343343
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
344344
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
345345

346-
from diffusers import FluxTransformer2DModel
346+
from diffusers import AutoModel
347347
from transformers import T5EncoderModel
348348

349349
quant_config = TransformersBitsAndBytesConfig(
@@ -363,7 +363,7 @@ quant_config = DiffusersBitsAndBytesConfig(
363363
bnb_4bit_use_double_quant=True,
364364
)
365365

366-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
366+
transformer_4bit = AutoModel.from_pretrained(
367367
"black-forest-labs/FLUX.1-dev",
368368
subfolder="transformer",
369369
quantization_config=quant_config,
@@ -379,7 +379,7 @@ Once quantized, you can dequantize a model to its original precision, but this m
379379
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
380380
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
381381

382-
from diffusers import FluxTransformer2DModel
382+
from diffusers import AutoModel
383383
from transformers import T5EncoderModel
384384

385385
quant_config = TransformersBitsAndBytesConfig(
@@ -399,7 +399,7 @@ quant_config = DiffusersBitsAndBytesConfig(
399399
bnb_4bit_use_double_quant=True,
400400
)
401401

402-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
402+
transformer_4bit = AutoModel.from_pretrained(
403403
"black-forest-labs/FLUX.1-dev",
404404
subfolder="transformer",
405405
quantization_config=quant_config,

0 commit comments

Comments
 (0)