Skip to content

Commit ed55b90

Browse files
authored
Merge branch 'main' into enable_xpu
2 parents 9052f83 + 76b7d86 commit ed55b90

File tree

78 files changed

+6846
-1048
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+6846
-1048
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,62 @@ jobs:
180180
pip install slack_sdk tabulate
181181
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
182182
183+
run_big_gpu_torch_tests:
184+
name: Torch tests on big GPU
185+
strategy:
186+
fail-fast: false
187+
max-parallel: 2
188+
runs-on:
189+
group: aws-g6e-xlarge-plus
190+
container:
191+
image: diffusers/diffusers-pytorch-cuda
192+
options: --shm-size "16gb" --ipc host --gpus 0
193+
steps:
194+
- name: Checkout diffusers
195+
uses: actions/checkout@v3
196+
with:
197+
fetch-depth: 2
198+
- name: NVIDIA-SMI
199+
run: nvidia-smi
200+
- name: Install dependencies
201+
run: |
202+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
203+
python -m uv pip install -e [quality,test]
204+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
205+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
206+
python -m uv pip install pytest-reportlog
207+
- name: Environment
208+
run: |
209+
python utils/print_env.py
210+
- name: Selected Torch CUDA Test on big GPU
211+
env:
212+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
213+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
214+
CUBLAS_WORKSPACE_CONFIG: :16:8
215+
BIG_GPU_MEMORY: 40
216+
run: |
217+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
218+
-m "big_gpu_with_torch_cuda" \
219+
--make-reports=tests_big_gpu_torch_cuda \
220+
--report-log=tests_big_gpu_torch_cuda.log \
221+
tests/
222+
- name: Failure short reports
223+
if: ${{ failure() }}
224+
run: |
225+
cat reports/tests_big_gpu_torch_cuda_stats.txt
226+
cat reports/tests_big_gpu_torch_cuda_failures_short.txt
227+
- name: Test suite reports artifacts
228+
if: ${{ always() }}
229+
uses: actions/upload-artifact@v4
230+
with:
231+
name: torch_cuda_big_gpu_test_reports
232+
path: reports
233+
- name: Generate Report and Notify Channel
234+
if: always()
235+
run: |
236+
pip install slack_sdk tabulate
237+
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
238+
183239
run_flax_tpu_tests:
184240
name: Nightly Flax TPU Tests
185241
runs-on: docker-tpu

.github/workflows/ssh-runner.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,13 @@ on:
44
workflow_dispatch:
55
inputs:
66
runner_type:
7-
description: 'Type of runner to test (aws-g6-4xlarge-plus: a10 or aws-g4dn-2xlarge: t4)'
7+
description: 'Type of runner to test (aws-g6-4xlarge-plus: a10, aws-g4dn-2xlarge: t4, aws-g6e-xlarge-plus: L40)'
88
type: choice
99
required: true
1010
options:
1111
- aws-g6-4xlarge-plus
1212
- aws-g4dn-2xlarge
13+
- aws-g6e-xlarge-plus
1314
docker_image:
1415
description: 'Name of the Docker image'
1516
required: true

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,8 @@
270270
title: LatteTransformer3DModel
271271
- local: api/models/lumina_nextdit2d
272272
title: LuminaNextDiT2DModel
273+
- local: api/models/mochi_transformer3d
274+
title: MochiTransformer3DModel
273275
- local: api/models/pixart_transformer2d
274276
title: PixArtTransformer2DModel
275277
- local: api/models/prior_transformer
@@ -306,6 +308,8 @@
306308
title: AutoencoderKLAllegro
307309
- local: api/models/autoencoderkl_cogvideox
308310
title: AutoencoderKLCogVideoX
311+
- local: api/models/autoencoderkl_mochi
312+
title: AutoencoderKLMochi
309313
- local: api/models/asymmetricautoencoderkl
310314
title: AsymmetricAutoencoderKL
311315
- local: api/models/consistency_decoder_vae
@@ -400,6 +404,8 @@
400404
title: Lumina-T2X
401405
- local: api/pipelines/marigold
402406
title: Marigold
407+
- local: api/pipelines/mochi
408+
title: Mochi
403409
- local: api/pipelines/panorama
404410
title: MultiDiffusion
405411
- local: api/pipelines/musicldm
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLMochi
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLMochi
20+
21+
vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda")
22+
```
23+
24+
## AutoencoderKLMochi
25+
26+
[[autodoc]] AutoencoderKLMochi
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# MochiTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Mochi-1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Genmo.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import MochiTransformer3DModel
20+
21+
vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
22+
```
23+
24+
## MochiTransformer3DModel
25+
26+
[[autodoc]] MochiTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
-->
15+
16+
# Mochi
17+
18+
[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo.
19+
20+
*Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. This model dramatically closes the gap between closed and open video generation systems. The model is released under a permissive Apache 2.0 license.*
21+
22+
<Tip>
23+
24+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
25+
26+
</Tip>
27+
28+
## MochiPipeline
29+
30+
[[autodoc]] MochiPipeline
31+
- all
32+
- __call__
33+
34+
## MochiPipelineOutput
35+
36+
[[autodoc]] pipelines.mochi.pipeline_output.MochiPipelineOutput

docs/source/en/training/distributed_inference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ Add the transformer model to the pipeline for denoising, but set the other model
183183

184184
```py
185185
pipeline = FluxPipeline.from_pretrained(
186-
"black-forest-labs/FLUX.1-dev", ,
186+
"black-forest-labs/FLUX.1-dev",
187187
text_encoder=None,
188188
text_encoder_2=None,
189189
tokenizer=None,

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1778,15 +1778,10 @@ def load_model_hook(models, input_dir):
17781778
if not args.enable_t5_ti:
17791779
# pure textual inversion - only clip
17801780
if pure_textual_inversion:
1781-
params_to_optimize = [
1782-
text_parameters_one_with_lr,
1783-
]
1781+
params_to_optimize = [text_parameters_one_with_lr]
17841782
te_idx = 0
17851783
else: # regular te training or regular pivotal for clip
1786-
params_to_optimize = [
1787-
transformer_parameters_with_lr,
1788-
text_parameters_one_with_lr,
1789-
]
1784+
params_to_optimize = [transformer_parameters_with_lr, text_parameters_one_with_lr]
17901785
te_idx = 1
17911786
elif args.enable_t5_ti:
17921787
# pivotal tuning of clip & t5
@@ -1809,9 +1804,7 @@ def load_model_hook(models, input_dir):
18091804
]
18101805
te_idx = 1
18111806
else:
1812-
params_to_optimize = [
1813-
transformer_parameters_with_lr,
1814-
]
1807+
params_to_optimize = [transformer_parameters_with_lr]
18151808

18161809
# Optimizer creation
18171810
if not (args.optimizer.lower() == "prodigy" or args.optimizer.lower() == "adamw"):
@@ -1871,7 +1864,6 @@ def load_model_hook(models, input_dir):
18711864
params_to_optimize[-1]["lr"] = args.learning_rate
18721865
optimizer = optimizer_class(
18731866
params_to_optimize,
1874-
lr=args.learning_rate,
18751867
betas=(args.adam_beta1, args.adam_beta2),
18761868
beta3=args.prodigy_beta3,
18771869
weight_decay=args.adam_weight_decay,

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 39 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
convert_state_dict_to_kohya,
6868
is_wandb_available,
6969
)
70+
from diffusers.utils.hub_utils import load_or_create_model_card, populate_model_card
7071
from diffusers.utils.import_utils import is_xformers_available
7172

7273

@@ -79,30 +80,27 @@
7980
def save_model_card(
8081
repo_id: str,
8182
use_dora: bool,
82-
images=None,
83-
base_model=str,
83+
images: list = None,
84+
base_model: str = None,
8485
train_text_encoder=False,
8586
train_text_encoder_ti=False,
8687
token_abstraction_dict=None,
87-
instance_prompt=str,
88-
validation_prompt=str,
88+
instance_prompt=None,
89+
validation_prompt=None,
8990
repo_folder=None,
9091
vae_path=None,
9192
):
92-
img_str = "widget:\n"
9393
lora = "lora" if not use_dora else "dora"
94-
for i, image in enumerate(images):
95-
image.save(os.path.join(repo_folder, f"image_{i}.png"))
96-
img_str += f"""
97-
- text: '{validation_prompt if validation_prompt else ' ' }'
98-
output:
99-
url:
100-
"image_{i}.png"
101-
"""
102-
if not images:
103-
img_str += f"""
104-
- text: '{instance_prompt}'
105-
"""
94+
95+
widget_dict = []
96+
if images is not None:
97+
for i, image in enumerate(images):
98+
image.save(os.path.join(repo_folder, f"image_{i}.png"))
99+
widget_dict.append(
100+
{"text": validation_prompt if validation_prompt else " ", "output": {"url": f"image_{i}.png"}}
101+
)
102+
else:
103+
widget_dict.append({"text": instance_prompt})
106104
embeddings_filename = f"{repo_folder}_emb"
107105
instance_prompt_webui = re.sub(r"<s\d+>", "", re.sub(r"<s\d+>", embeddings_filename, instance_prompt, count=1))
108106
ti_keys = ", ".join(f'"{match}"' for match in re.findall(r"<s\d+>", instance_prompt))
@@ -137,24 +135,7 @@ def save_model_card(
137135
trigger_str += f"""
138136
to trigger concept `{key}` → use `{tokens}` in your prompt \n
139137
"""
140-
141-
yaml = f"""---
142-
tags:
143-
- stable-diffusion
144-
- stable-diffusion-diffusers
145-
- diffusers-training
146-
- text-to-image
147-
- diffusers
148-
- {lora}
149-
- template:sd-lora
150-
{img_str}
151-
base_model: {base_model}
152-
instance_prompt: {instance_prompt}
153-
license: openrail++
154-
---
155-
"""
156-
157-
model_card = f"""
138+
model_description = f"""
158139
# SD1.5 LoRA DreamBooth - {repo_id}
159140
160141
<Gallery />
@@ -202,8 +183,28 @@ def save_model_card(
202183
Special VAE used for training: {vae_path}.
203184
204185
"""
205-
with open(os.path.join(repo_folder, "README.md"), "w") as f:
206-
f.write(yaml + model_card)
186+
model_card = load_or_create_model_card(
187+
repo_id_or_path=repo_id,
188+
from_training=True,
189+
license="openrail++",
190+
base_model=base_model,
191+
prompt=instance_prompt,
192+
model_description=model_description,
193+
inference=True,
194+
widget=widget_dict,
195+
)
196+
197+
tags = [
198+
"text-to-image",
199+
"diffusers",
200+
"diffusers-training",
201+
lora,
202+
"template:sd-lora" "stable-diffusion",
203+
"stable-diffusion-diffusers",
204+
]
205+
model_card = populate_model_card(model_card, tags=tags)
206+
207+
model_card.save(os.path.join(repo_folder, "README.md"))
207208

208209

209210
def import_model_class_from_model_name_or_path(
@@ -1358,10 +1359,7 @@ def load_model_hook(models, input_dir):
13581359
else args.adam_weight_decay,
13591360
"lr": args.text_encoder_lr if args.text_encoder_lr else args.learning_rate,
13601361
}
1361-
params_to_optimize = [
1362-
unet_lora_parameters_with_lr,
1363-
text_lora_parameters_one_with_lr,
1364-
]
1362+
params_to_optimize = [unet_lora_parameters_with_lr, text_lora_parameters_one_with_lr]
13651363
else:
13661364
params_to_optimize = [unet_lora_parameters_with_lr]
13671365

@@ -1423,7 +1421,6 @@ def load_model_hook(models, input_dir):
14231421

14241422
optimizer = optimizer_class(
14251423
params_to_optimize,
1426-
lr=args.learning_rate,
14271424
betas=(args.adam_beta1, args.adam_beta2),
14281425
beta3=args.prodigy_beta3,
14291426
weight_decay=args.adam_weight_decay,

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1794,7 +1794,6 @@ def load_model_hook(models, input_dir):
17941794

17951795
optimizer = optimizer_class(
17961796
params_to_optimize,
1797-
lr=args.learning_rate,
17981797
betas=(args.adam_beta1, args.adam_beta2),
17991798
beta3=args.prodigy_beta3,
18001799
weight_decay=args.adam_weight_decay,

0 commit comments

Comments
 (0)