Skip to content

Commit 05ccc90

Browse files
authored
Merge branch 'main' into support-comyui-flux-loras
2 parents 1c98875 + 5428046 commit 05ccc90

40 files changed

+5048
-420
lines changed

docs/source/en/api/pipelines/wan.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@
1414

1515
# Wan
1616

17+
<div class="flex flex-wrap space-x-1">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</div>
20+
1721
[Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.
1822

1923
<!-- TODO(aryan): update abstract once paper is out -->

docs/source/en/quantization/torchao.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ image = pipe(prompt, num_inference_steps=30, guidance_scale=7.0).images[0]
126126
image.save("output.png")
127127
```
128128

129-
Some quantization methods, such as `uint4wo`, cannot be loaded directly and may result in an `UnpicklingError` when trying to load the models, but work as expected when saving them. In order to work around this, one can load the state dict manually into the model. Note, however, that this requires using `weights_only=False` in `torch.load`, so it should be run only if the weights were obtained from a trustable source.
129+
If you are using `torch<=2.6.0`, some quantization methods, such as `uint4wo`, cannot be loaded directly and may result in an `UnpicklingError` when trying to load the models, but work as expected when saving them. In order to work around this, one can load the state dict manually into the model. Note, however, that this requires using `weights_only=False` in `torch.load`, so it should be run only if the weights were obtained from a trustable source.
130130

131131
```python
132132
import torch

examples/advanced_diffusion_training/README_flux.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,13 +79,13 @@ This command will prompt you for a token. Copy-paste yours from your [settings/t
7979
### Target Modules
8080
When LoRA was first adapted from language models to diffusion models, it was applied to the cross-attention layers in the Unet that relate the image representations with the prompts that describe them.
8181
More recently, SOTA text-to-image diffusion models replaced the Unet with a diffusion Transformer(DiT). With this change, we may also want to explore
82-
applying LoRA training onto different types of layers and blocks. To allow more flexibility and control over the targeted modules we added `--lora_layers`- in which you can specify in a comma seperated string
82+
applying LoRA training onto different types of layers and blocks. To allow more flexibility and control over the targeted modules we added `--lora_layers`- in which you can specify in a comma separated string
8383
the exact modules for LoRA training. Here are some examples of target modules you can provide:
8484
- for attention only layers: `--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0"`
8585
- to train the same modules as in the fal trainer: `--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2"`
8686
- to train the same modules as in ostris ai-toolkit / replicate trainer: `--lora_blocks="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2,norm1_context.linear, norm1.linear,norm.linear,proj_mlp,proj_out"`
8787
> [!NOTE]
88-
> `--lora_layers` can also be used to specify which **blocks** to apply LoRA training to. To do so, simply add a block prefix to each layer in the comma seperated string:
88+
> `--lora_layers` can also be used to specify which **blocks** to apply LoRA training to. To do so, simply add a block prefix to each layer in the comma separated string:
8989
> **single DiT blocks**: to target the ith single transformer block, add the prefix `single_transformer_blocks.i`, e.g. - `single_transformer_blocks.i.attn.to_k`
9090
> **MMDiT blocks**: to target the ith MMDiT block, add the prefix `transformer_blocks.i`, e.g. - `transformer_blocks.i.attn.to_k`
9191
> [!NOTE]

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -378,7 +378,7 @@ def parse_args(input_args=None):
378378
default=None,
379379
help="the concept to use to initialize the new inserted tokens when training with "
380380
"--train_text_encoder_ti = True. By default, new tokens (<si><si+1>) are initialized with random value. "
381-
"Alternatively, you could specify a different word/words whos value will be used as the starting point for the new inserted tokens. "
381+
"Alternatively, you could specify a different word/words whose value will be used as the starting point for the new inserted tokens. "
382382
"--num_new_tokens_per_abstraction is ignored when initializer_concept is provided",
383383
)
384384
parser.add_argument(
@@ -662,7 +662,7 @@ def parse_args(input_args=None):
662662
type=str,
663663
default=None,
664664
help=(
665-
"The transformer modules to apply LoRA training on. Please specify the layers in a comma seperated. "
665+
"The transformer modules to apply LoRA training on. Please specify the layers in a comma separated. "
666666
'E.g. - "to_k,to_q,to_v,to_out.0" will result in lora training of attention layers only. For more examples refer to https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/README_flux.md'
667667
),
668668
)

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -662,7 +662,7 @@ def parse_args(input_args=None):
662662
action="store_true",
663663
default=False,
664664
help=(
665-
"Wether to train a DoRA as proposed in- DoRA: Weight-Decomposed Low-Rank Adaptation https://arxiv.org/abs/2402.09353. "
665+
"Whether to train a DoRA as proposed in- DoRA: Weight-Decomposed Low-Rank Adaptation https://arxiv.org/abs/2402.09353. "
666666
"Note: to use DoRA you need to install peft from main, `pip install git+https://github.com/huggingface/peft.git`"
667667
),
668668
)

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -773,7 +773,7 @@ def parse_args(input_args=None):
773773
action="store_true",
774774
default=False,
775775
help=(
776-
"Wether to train a DoRA as proposed in- DoRA: Weight-Decomposed Low-Rank Adaptation https://arxiv.org/abs/2402.09353. "
776+
"Whether to train a DoRA as proposed in- DoRA: Weight-Decomposed Low-Rank Adaptation https://arxiv.org/abs/2402.09353. "
777777
"Note: to use DoRA you need to install peft from main, `pip install git+https://github.com/huggingface/peft.git`"
778778
),
779779
)
@@ -1875,7 +1875,7 @@ def compute_text_embeddings(prompt, text_encoders, tokenizers, clip_skip):
18751875
# pack the statically computed variables appropriately here. This is so that we don't
18761876
# have to pass them to the dataloader.
18771877

1878-
# if --train_text_encoder_ti we need add_special_tokens to be True fo textual inversion
1878+
# if --train_text_encoder_ti we need add_special_tokens to be True for textual inversion
18791879
add_special_tokens = True if args.train_text_encoder_ti else False
18801880

18811881
if not train_dataset.custom_instance_prompts:

examples/community/mixture_tiling_sdxl.py

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright 2025 The HuggingFace Team. All rights reserved.
1+
# Copyright 2025 The DEVAIEXP Team and The HuggingFace Team. All rights reserved.
22
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.
@@ -1070,32 +1070,32 @@ def __call__(
10701070
text_encoder_projection_dim = int(pooled_prompt_embeds.shape[-1])
10711071
else:
10721072
text_encoder_projection_dim = self.text_encoder_2.config.projection_dim
1073-
add_time_ids = self._get_add_time_ids(
1074-
original_size,
1075-
crops_coords_top_left[row][col],
1076-
target_size,
1073+
add_time_ids = self._get_add_time_ids(
1074+
original_size,
1075+
crops_coords_top_left[row][col],
1076+
target_size,
1077+
dtype=prompt_embeds.dtype,
1078+
text_encoder_projection_dim=text_encoder_projection_dim,
1079+
)
1080+
if negative_original_size is not None and negative_target_size is not None:
1081+
negative_add_time_ids = self._get_add_time_ids(
1082+
negative_original_size,
1083+
negative_crops_coords_top_left[row][col],
1084+
negative_target_size,
10771085
dtype=prompt_embeds.dtype,
10781086
text_encoder_projection_dim=text_encoder_projection_dim,
10791087
)
1080-
if negative_original_size is not None and negative_target_size is not None:
1081-
negative_add_time_ids = self._get_add_time_ids(
1082-
negative_original_size,
1083-
negative_crops_coords_top_left[row][col],
1084-
negative_target_size,
1085-
dtype=prompt_embeds.dtype,
1086-
text_encoder_projection_dim=text_encoder_projection_dim,
1087-
)
1088-
else:
1089-
negative_add_time_ids = add_time_ids
1088+
else:
1089+
negative_add_time_ids = add_time_ids
10901090

1091-
if self.do_classifier_free_guidance:
1092-
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
1093-
add_text_embeds = torch.cat([negative_pooled_prompt_embeds, add_text_embeds], dim=0)
1094-
add_time_ids = torch.cat([negative_add_time_ids, add_time_ids], dim=0)
1091+
if self.do_classifier_free_guidance:
1092+
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
1093+
add_text_embeds = torch.cat([negative_pooled_prompt_embeds, add_text_embeds], dim=0)
1094+
add_time_ids = torch.cat([negative_add_time_ids, add_time_ids], dim=0)
10951095

1096-
prompt_embeds = prompt_embeds.to(device)
1097-
add_text_embeds = add_text_embeds.to(device)
1098-
add_time_ids = add_time_ids.to(device).repeat(batch_size * num_images_per_prompt, 1)
1096+
prompt_embeds = prompt_embeds.to(device)
1097+
add_text_embeds = add_text_embeds.to(device)
1098+
add_time_ids = add_time_ids.to(device).repeat(batch_size * num_images_per_prompt, 1)
10991099
addition_embed_type_row.append((prompt_embeds, add_text_embeds, add_time_ids))
11001100
embeddings_and_added_time.append(addition_embed_type_row)
11011101

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# AnyTextPipeline Pipeline
2+
3+
Project page: https://aigcdesigngroup.github.io/homepage_anytext
4+
5+
"AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy."
6+
7+
Each text line that needs to be generated should be enclosed in double quotes. For any usage questions, please refer to the [paper](https://arxiv.org/abs/2311.03054).
8+
9+
10+
```py
11+
import torch
12+
from diffusers import DiffusionPipeline
13+
from anytext_controlnet import AnyTextControlNetModel
14+
from diffusers.utils import load_image
15+
16+
# I chose a font file shared by an HF staff:
17+
# !wget https://huggingface.co/spaces/ysharma/TranslateQuotesInImageForwards/resolve/main/arial-unicode-ms.ttf
18+
19+
anytext_controlnet = AnyTextControlNetModel.from_pretrained("tolgacangoz/anytext-controlnet", torch_dtype=torch.float16,
20+
variant="fp16",)
21+
pipe = DiffusionPipeline.from_pretrained("tolgacangoz/anytext", font_path="arial-unicode-ms.ttf",
22+
controlnet=anytext_controlnet, torch_dtype=torch.float16,
23+
trust_remote_code=False, # One needs to give permission to run this pipeline's code
24+
).to("cuda")
25+
26+
# generate image
27+
prompt = 'photo of caramel macchiato coffee on the table, top-down perspective, with "Any" "Text" written on it using cream'
28+
draw_pos = load_image("https://raw.githubusercontent.com/tyxsspa/AnyText/refs/heads/main/example_images/gen9.png")
29+
image = pipe(prompt, num_inference_steps=20, mode="generate", draw_pos=draw_pos,
30+
).images[0]
31+
image
32+
```

0 commit comments

Comments
 (0)