Skip to content

Commit 10b6ba1

Browse files
authored
Merge branch 'main' into update_ptxla_training
2 parents 8c47f35 + 3f329a4 commit 10b6ba1

File tree

61 files changed

+4812
-1103
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+4812
-1103
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,8 @@
270270
title: LatteTransformer3DModel
271271
- local: api/models/lumina_nextdit2d
272272
title: LuminaNextDiT2DModel
273+
- local: api/models/mochi_transformer3d
274+
title: MochiTransformer3DModel
273275
- local: api/models/pixart_transformer2d
274276
title: PixArtTransformer2DModel
275277
- local: api/models/prior_transformer
@@ -306,6 +308,8 @@
306308
title: AutoencoderKLAllegro
307309
- local: api/models/autoencoderkl_cogvideox
308310
title: AutoencoderKLCogVideoX
311+
- local: api/models/autoencoderkl_mochi
312+
title: AutoencoderKLMochi
309313
- local: api/models/asymmetricautoencoderkl
310314
title: AsymmetricAutoencoderKL
311315
- local: api/models/consistency_decoder_vae
@@ -400,6 +404,8 @@
400404
title: Lumina-T2X
401405
- local: api/pipelines/marigold
402406
title: Marigold
407+
- local: api/pipelines/mochi
408+
title: Mochi
403409
- local: api/pipelines/panorama
404410
title: MultiDiffusion
405411
- local: api/pipelines/musicldm
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLMochi
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLMochi
20+
21+
vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda")
22+
```
23+
24+
## AutoencoderKLMochi
25+
26+
[[autodoc]] AutoencoderKLMochi
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# MochiTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Mochi-1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Genmo.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import MochiTransformer3DModel
20+
21+
vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
22+
```
23+
24+
## MochiTransformer3DModel
25+
26+
[[autodoc]] MochiTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
-->
15+
16+
# Mochi
17+
18+
[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo.
19+
20+
*Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. This model dramatically closes the gap between closed and open video generation systems. The model is released under a permissive Apache 2.0 license.*
21+
22+
<Tip>
23+
24+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
25+
26+
</Tip>
27+
28+
## MochiPipeline
29+
30+
[[autodoc]] MochiPipeline
31+
- all
32+
- __call__
33+
34+
## MochiPipelineOutput
35+
36+
[[autodoc]] pipelines.mochi.pipeline_output.MochiPipelineOutput

docs/source/en/training/distributed_inference.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -237,5 +237,3 @@ with torch.no_grad():
237237
```
238238

239239
By selectively loading and unloading the models you need at a given stage and sharding the largest models across multiple GPUs, it is possible to run inference with large models on consumer GPUs.
240-
241-
This workflow is also compatible with LoRAs via [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`]. However, only LoRAs without text encoder components are currently supported in this workflow.

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 38 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
convert_state_dict_to_kohya,
6868
is_wandb_available,
6969
)
70+
from diffusers.utils.hub_utils import load_or_create_model_card, populate_model_card
7071
from diffusers.utils.import_utils import is_xformers_available
7172

7273

@@ -79,30 +80,27 @@
7980
def save_model_card(
8081
repo_id: str,
8182
use_dora: bool,
82-
images=None,
83-
base_model=str,
83+
images: list = None,
84+
base_model: str = None,
8485
train_text_encoder=False,
8586
train_text_encoder_ti=False,
8687
token_abstraction_dict=None,
87-
instance_prompt=str,
88-
validation_prompt=str,
88+
instance_prompt=None,
89+
validation_prompt=None,
8990
repo_folder=None,
9091
vae_path=None,
9192
):
92-
img_str = "widget:\n"
9393
lora = "lora" if not use_dora else "dora"
94-
for i, image in enumerate(images):
95-
image.save(os.path.join(repo_folder, f"image_{i}.png"))
96-
img_str += f"""
97-
- text: '{validation_prompt if validation_prompt else ' ' }'
98-
output:
99-
url:
100-
"image_{i}.png"
101-
"""
102-
if not images:
103-
img_str += f"""
104-
- text: '{instance_prompt}'
105-
"""
94+
95+
widget_dict = []
96+
if images is not None:
97+
for i, image in enumerate(images):
98+
image.save(os.path.join(repo_folder, f"image_{i}.png"))
99+
widget_dict.append(
100+
{"text": validation_prompt if validation_prompt else " ", "output": {"url": f"image_{i}.png"}}
101+
)
102+
else:
103+
widget_dict.append({"text": instance_prompt})
106104
embeddings_filename = f"{repo_folder}_emb"
107105
instance_prompt_webui = re.sub(r"<s\d+>", "", re.sub(r"<s\d+>", embeddings_filename, instance_prompt, count=1))
108106
ti_keys = ", ".join(f'"{match}"' for match in re.findall(r"<s\d+>", instance_prompt))
@@ -137,24 +135,7 @@ def save_model_card(
137135
trigger_str += f"""
138136
to trigger concept `{key}` → use `{tokens}` in your prompt \n
139137
"""
140-
141-
yaml = f"""---
142-
tags:
143-
- stable-diffusion
144-
- stable-diffusion-diffusers
145-
- diffusers-training
146-
- text-to-image
147-
- diffusers
148-
- {lora}
149-
- template:sd-lora
150-
{img_str}
151-
base_model: {base_model}
152-
instance_prompt: {instance_prompt}
153-
license: openrail++
154-
---
155-
"""
156-
157-
model_card = f"""
138+
model_description = f"""
158139
# SD1.5 LoRA DreamBooth - {repo_id}
159140
160141
<Gallery />
@@ -202,8 +183,28 @@ def save_model_card(
202183
Special VAE used for training: {vae_path}.
203184
204185
"""
205-
with open(os.path.join(repo_folder, "README.md"), "w") as f:
206-
f.write(yaml + model_card)
186+
model_card = load_or_create_model_card(
187+
repo_id_or_path=repo_id,
188+
from_training=True,
189+
license="openrail++",
190+
base_model=base_model,
191+
prompt=instance_prompt,
192+
model_description=model_description,
193+
inference=True,
194+
widget=widget_dict,
195+
)
196+
197+
tags = [
198+
"text-to-image",
199+
"diffusers",
200+
"diffusers-training",
201+
lora,
202+
"template:sd-lora" "stable-diffusion",
203+
"stable-diffusion-diffusers",
204+
]
205+
model_card = populate_model_card(model_card, tags=tags)
206+
207+
model_card.save(os.path.join(repo_folder, "README.md"))
207208

208209

209210
def import_model_class_from_model_name_or_path(

examples/controlnet/train_controlnet_flux.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@ def log_validation(
152152
guidance_scale=3.5,
153153
generator=generator,
154154
).images[0]
155+
image = image.resize((args.resolution, args.resolution))
155156
images.append(image)
156157
image_logs.append(
157158
{"validation_image": validation_image, "images": images, "validation_prompt": validation_prompt}

examples/dreambooth/train_dreambooth_flux.py

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
is_wandb_available,
5858
)
5959
from diffusers.utils.hub_utils import load_or_create_model_card, populate_model_card
60+
from diffusers.utils.import_utils import is_torch_npu_available
6061
from diffusers.utils.torch_utils import is_compiled_module
6162

6263

@@ -68,6 +69,12 @@
6869

6970
logger = get_logger(__name__)
7071

72+
if is_torch_npu_available():
73+
import torch_npu
74+
75+
torch.npu.config.allow_internal_format = False
76+
torch.npu.set_compile_mode(jit_compile=False)
77+
7178

7279
def save_model_card(
7380
repo_id: str,
@@ -189,6 +196,8 @@ def log_validation(
189196
del pipeline
190197
if torch.cuda.is_available():
191198
torch.cuda.empty_cache()
199+
elif is_torch_npu_available():
200+
torch_npu.npu.empty_cache()
192201

193202
return images
194203

@@ -1035,7 +1044,9 @@ def main(args):
10351044
cur_class_images = len(list(class_images_dir.iterdir()))
10361045

10371046
if cur_class_images < args.num_class_images:
1038-
has_supported_fp16_accelerator = torch.cuda.is_available() or torch.backends.mps.is_available()
1047+
has_supported_fp16_accelerator = (
1048+
torch.cuda.is_available() or torch.backends.mps.is_available() or is_torch_npu_available()
1049+
)
10391050
torch_dtype = torch.float16 if has_supported_fp16_accelerator else torch.float32
10401051
if args.prior_generation_precision == "fp32":
10411052
torch_dtype = torch.float32
@@ -1073,6 +1084,8 @@ def main(args):
10731084
del pipeline
10741085
if torch.cuda.is_available():
10751086
torch.cuda.empty_cache()
1087+
elif is_torch_npu_available():
1088+
torch_npu.npu.empty_cache()
10761089

10771090
# Handle the repository creation
10781091
if accelerator.is_main_process:
@@ -1354,6 +1367,8 @@ def compute_text_embeddings(prompt, text_encoders, tokenizers):
13541367
gc.collect()
13551368
if torch.cuda.is_available():
13561369
torch.cuda.empty_cache()
1370+
elif is_torch_npu_available():
1371+
torch_npu.npu.empty_cache()
13571372

13581373
# If custom instance prompts are NOT provided (i.e. the instance prompt is used for all images),
13591374
# pack the statically computed variables appropriately here. This is so that we don't
@@ -1719,9 +1734,15 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
17191734
)
17201735
if not args.train_text_encoder:
17211736
del text_encoder_one, text_encoder_two
1722-
torch.cuda.empty_cache()
1737+
if torch.cuda.is_available():
1738+
torch.cuda.empty_cache()
1739+
elif is_torch_npu_available():
1740+
torch_npu.npu.empty_cache()
17231741
gc.collect()
17241742

1743+
images = None
1744+
del pipeline
1745+
17251746
# Save the lora layers
17261747
accelerator.wait_for_everyone()
17271748
if accelerator.is_main_process:
@@ -1780,6 +1801,9 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
17801801
ignore_patterns=["step_*", "epoch_*"],
17811802
)
17821803

1804+
images = None
1805+
del pipeline
1806+
17831807
accelerator.end_training()
17841808

17851809

examples/dreambooth/train_dreambooth_lora_flux.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ def log_validation(
177177
f"Running validation... \n Generating {args.num_validation_images} images with prompt:"
178178
f" {args.validation_prompt}."
179179
)
180-
pipeline = pipeline.to(accelerator.device, dtype=torch_dtype)
180+
pipeline = pipeline.to(accelerator.device)
181181
pipeline.set_progress_bar_config(disable=True)
182182

183183
# run inference
@@ -1706,7 +1706,7 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
17061706
)
17071707

17081708
# handle guidance
1709-
if transformer.config.guidance_embeds:
1709+
if accelerator.unwrap_model(transformer).config.guidance_embeds:
17101710
guidance = torch.tensor([args.guidance_scale], device=accelerator.device)
17111711
guidance = guidance.expand(model_input.shape[0])
17121712
else:
@@ -1819,6 +1819,8 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
18191819
# create pipeline
18201820
if not args.train_text_encoder:
18211821
text_encoder_one, text_encoder_two = load_text_encoders(text_encoder_cls_one, text_encoder_cls_two)
1822+
text_encoder_one.to(weight_dtype)
1823+
text_encoder_two.to(weight_dtype)
18221824
pipeline = FluxPipeline.from_pretrained(
18231825
args.pretrained_model_name_or_path,
18241826
vae=vae,
@@ -1842,6 +1844,9 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
18421844
del text_encoder_one, text_encoder_two
18431845
free_memory()
18441846

1847+
images = None
1848+
del pipeline
1849+
18451850
# Save the lora layers
18461851
accelerator.wait_for_everyone()
18471852
if accelerator.is_main_process:
@@ -1906,6 +1911,9 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
19061911
ignore_patterns=["step_*", "epoch_*"],
19071912
)
19081913

1914+
images = None
1915+
del pipeline
1916+
19091917
accelerator.end_training()
19101918

19111919

examples/reinforcement_learning/README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,13 @@
1-
# Overview
1+
2+
## Diffusion-based Policy Learning for RL
3+
4+
`diffusion_policy` implements [Diffusion Policy](https://diffusion-policy.cs.columbia.edu/), a diffusion model that predicts robot action sequences in reinforcement learning tasks.
5+
6+
This example implements a robot control model for pushing a T-shaped block into a target area. The model takes in current state observations as input, and outputs a trajectory of subsequent steps to follow.
7+
8+
To execute the script, run `diffusion_policy.py`
9+
10+
## Diffuser Locomotion
211

312
These examples show how to run [Diffuser](https://arxiv.org/abs/2205.09991) in Diffusers.
413
There are two ways to use the script, `run_diffuser_locomotion.py`.

0 commit comments

Comments
 (0)