Skip to content

Commit 55fb06c

Browse files
authored
Merge branch 'main' into rf-inversion
2 parents 3a3da68 + 345907f commit 55fb06c

File tree

101 files changed

+8956
-2950
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+8956
-2950
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,8 @@
270270
title: LatteTransformer3DModel
271271
- local: api/models/lumina_nextdit2d
272272
title: LuminaNextDiT2DModel
273+
- local: api/models/mochi_transformer3d
274+
title: MochiTransformer3DModel
273275
- local: api/models/pixart_transformer2d
274276
title: PixArtTransformer2DModel
275277
- local: api/models/prior_transformer
@@ -306,6 +308,8 @@
306308
title: AutoencoderKLAllegro
307309
- local: api/models/autoencoderkl_cogvideox
308310
title: AutoencoderKLCogVideoX
311+
- local: api/models/autoencoderkl_mochi
312+
title: AutoencoderKLMochi
309313
- local: api/models/asymmetricautoencoderkl
310314
title: AsymmetricAutoencoderKL
311315
- local: api/models/consistency_decoder_vae
@@ -400,6 +404,8 @@
400404
title: Lumina-T2X
401405
- local: api/pipelines/marigold
402406
title: Marigold
407+
- local: api/pipelines/mochi
408+
title: Mochi
403409
- local: api/pipelines/panorama
404410
title: MultiDiffusion
405411
- local: api/pipelines/musicldm
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLMochi
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLMochi
20+
21+
vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda")
22+
```
23+
24+
## AutoencoderKLMochi
25+
26+
[[autodoc]] AutoencoderKLMochi
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput

docs/source/en/api/models/controlnet.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,12 @@ pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=contro
3939

4040
## ControlNetOutput
4141

42-
[[autodoc]] models.controlnet.ControlNetOutput
42+
[[autodoc]] models.controlnets.controlnet.ControlNetOutput
4343

4444
## FlaxControlNetModel
4545

4646
[[autodoc]] FlaxControlNetModel
4747

4848
## FlaxControlNetOutput
4949

50-
[[autodoc]] models.controlnet_flax.FlaxControlNetOutput
50+
[[autodoc]] models.controlnets.controlnet_flax.FlaxControlNetOutput

docs/source/en/api/models/controlnet_sd3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,5 +38,5 @@ pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-di
3838

3939
## SD3ControlNetOutput
4040

41-
[[autodoc]] models.controlnet_sd3.SD3ControlNetOutput
41+
[[autodoc]] models.controlnets.controlnet_sd3.SD3ControlNetOutput
4242

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# MochiTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Mochi-1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Genmo.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import MochiTransformer3DModel
20+
21+
vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
22+
```
23+
24+
## MochiTransformer3DModel
25+
26+
[[autodoc]] MochiTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

docs/source/en/api/pipelines/controlnet_sd3.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ This controlnet code is mainly implemented by [The InstantX Team](https://huggin
2828
| ControlNet type | Developer | Link |
2929
| -------- | ---------- | ---- |
3030
| Canny | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Canny) |
31+
| Depth | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Depth) |
3132
| Pose | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Pose) |
3233
| Tile | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Tile) |
3334
| Inpainting | [The AlimamaCreative Team](https://huggingface.co/alimama-creative) | [link](https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting) |
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
-->
15+
16+
# Mochi
17+
18+
[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo.
19+
20+
*Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. This model dramatically closes the gap between closed and open video generation systems. The model is released under a permissive Apache 2.0 license.*
21+
22+
<Tip>
23+
24+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
25+
26+
</Tip>
27+
28+
## MochiPipeline
29+
30+
[[autodoc]] MochiPipeline
31+
- all
32+
- __call__
33+
34+
## MochiPipelineOutput
35+
36+
[[autodoc]] pipelines.mochi.pipeline_output.MochiPipelineOutput

docs/source/en/training/distributed_inference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ Add the transformer model to the pipeline for denoising, but set the other model
183183

184184
```py
185185
pipeline = FluxPipeline.from_pretrained(
186-
"black-forest-labs/FLUX.1-dev", ,
186+
"black-forest-labs/FLUX.1-dev",
187187
text_encoder=None,
188188
text_encoder_2=None,
189189
tokenizer=None,

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
from accelerate.utils import DistributedDataParallelKwargs, ProjectConfiguration, set_seed
4040
from huggingface_hub import create_repo, upload_folder
4141
from packaging import version
42-
from peft import LoraConfig
42+
from peft import LoraConfig, set_peft_model_state_dict
4343
from peft.utils import get_peft_model_state_dict
4444
from PIL import Image
4545
from PIL.ImageOps import exif_transpose
@@ -59,12 +59,13 @@
5959
)
6060
from diffusers.loaders import StableDiffusionLoraLoaderMixin
6161
from diffusers.optimization import get_scheduler
62-
from diffusers.training_utils import compute_snr
62+
from diffusers.training_utils import _set_state_dict_into_text_encoder, cast_training_params, compute_snr
6363
from diffusers.utils import (
6464
check_min_version,
6565
convert_all_state_dict_to_peft,
6666
convert_state_dict_to_diffusers,
6767
convert_state_dict_to_kohya,
68+
convert_unet_state_dict_to_peft,
6869
is_wandb_available,
6970
)
7071
from diffusers.utils.hub_utils import load_or_create_model_card, populate_model_card
@@ -1319,6 +1320,37 @@ def load_model_hook(models, input_dir):
13191320
else:
13201321
raise ValueError(f"unexpected save model: {model.__class__}")
13211322

1323+
lora_state_dict, network_alphas = StableDiffusionPipeline.lora_state_dict(input_dir)
1324+
1325+
unet_state_dict = {f'{k.replace("unet.", "")}': v for k, v in lora_state_dict.items() if k.startswith("unet.")}
1326+
unet_state_dict = convert_unet_state_dict_to_peft(unet_state_dict)
1327+
incompatible_keys = set_peft_model_state_dict(unet_, unet_state_dict, adapter_name="default")
1328+
if incompatible_keys is not None:
1329+
# check only for unexpected keys
1330+
unexpected_keys = getattr(incompatible_keys, "unexpected_keys", None)
1331+
if unexpected_keys:
1332+
logger.warning(
1333+
f"Loading adapter weights from state_dict led to unexpected keys not found in the model: "
1334+
f" {unexpected_keys}. "
1335+
)
1336+
1337+
if args.train_text_encoder:
1338+
# Do we need to call `scale_lora_layers()` here?
1339+
_set_state_dict_into_text_encoder(lora_state_dict, prefix="text_encoder.", text_encoder=text_encoder_one_)
1340+
1341+
_set_state_dict_into_text_encoder(
1342+
lora_state_dict, prefix="text_encoder_2.", text_encoder=text_encoder_one_
1343+
)
1344+
1345+
# Make sure the trainable params are in float32. This is again needed since the base models
1346+
# are in `weight_dtype`. More details:
1347+
# https://github.com/huggingface/diffusers/pull/6514#discussion_r1449796804
1348+
if args.mixed_precision == "fp16":
1349+
models = [unet_]
1350+
if args.train_text_encoder:
1351+
models.extend([text_encoder_one_])
1352+
# only upcast trainable parameters (LoRA) into fp32
1353+
cast_training_params(models)
13221354
lora_state_dict, network_alphas = StableDiffusionLoraLoaderMixin.lora_state_dict(input_dir)
13231355
StableDiffusionLoraLoaderMixin.load_lora_into_unet(lora_state_dict, network_alphas=network_alphas, unet=unet_)
13241356

0 commit comments

Comments
 (0)