Skip to content

Commit a1eacb3

Browse files
authored
Merge branch 'main' into remote-vae-wan-decode
2 parents 79620bf + 733b44a commit a1eacb3

22 files changed

+264
-298
lines changed

docs/source/en/api/pipelines/wan.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@
1414

1515
# Wan
1616

17+
<div class="flex flex-wrap space-x-1">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</div>
20+
1721
[Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.
1822

1923
<!-- TODO(aryan): update abstract once paper is out -->

examples/advanced_diffusion_training/README_flux.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,13 +79,13 @@ This command will prompt you for a token. Copy-paste yours from your [settings/t
7979
### Target Modules
8080
When LoRA was first adapted from language models to diffusion models, it was applied to the cross-attention layers in the Unet that relate the image representations with the prompts that describe them.
8181
More recently, SOTA text-to-image diffusion models replaced the Unet with a diffusion Transformer(DiT). With this change, we may also want to explore
82-
applying LoRA training onto different types of layers and blocks. To allow more flexibility and control over the targeted modules we added `--lora_layers`- in which you can specify in a comma seperated string
82+
applying LoRA training onto different types of layers and blocks. To allow more flexibility and control over the targeted modules we added `--lora_layers`- in which you can specify in a comma separated string
8383
the exact modules for LoRA training. Here are some examples of target modules you can provide:
8484
- for attention only layers: `--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0"`
8585
- to train the same modules as in the fal trainer: `--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2"`
8686
- to train the same modules as in ostris ai-toolkit / replicate trainer: `--lora_blocks="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2,norm1_context.linear, norm1.linear,norm.linear,proj_mlp,proj_out"`
8787
> [!NOTE]
88-
> `--lora_layers` can also be used to specify which **blocks** to apply LoRA training to. To do so, simply add a block prefix to each layer in the comma seperated string:
88+
> `--lora_layers` can also be used to specify which **blocks** to apply LoRA training to. To do so, simply add a block prefix to each layer in the comma separated string:
8989
> **single DiT blocks**: to target the ith single transformer block, add the prefix `single_transformer_blocks.i`, e.g. - `single_transformer_blocks.i.attn.to_k`
9090
> **MMDiT blocks**: to target the ith MMDiT block, add the prefix `transformer_blocks.i`, e.g. - `transformer_blocks.i.attn.to_k`
9191
> [!NOTE]

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -378,7 +378,7 @@ def parse_args(input_args=None):
378378
default=None,
379379
help="the concept to use to initialize the new inserted tokens when training with "
380380
"--train_text_encoder_ti = True. By default, new tokens (<si><si+1>) are initialized with random value. "
381-
"Alternatively, you could specify a different word/words whos value will be used as the starting point for the new inserted tokens. "
381+
"Alternatively, you could specify a different word/words whose value will be used as the starting point for the new inserted tokens. "
382382
"--num_new_tokens_per_abstraction is ignored when initializer_concept is provided",
383383
)
384384
parser.add_argument(
@@ -662,7 +662,7 @@ def parse_args(input_args=None):
662662
type=str,
663663
default=None,
664664
help=(
665-
"The transformer modules to apply LoRA training on. Please specify the layers in a comma seperated. "
665+
"The transformer modules to apply LoRA training on. Please specify the layers in a comma separated. "
666666
'E.g. - "to_k,to_q,to_v,to_out.0" will result in lora training of attention layers only. For more examples refer to https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/README_flux.md'
667667
),
668668
)

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -662,7 +662,7 @@ def parse_args(input_args=None):
662662
action="store_true",
663663
default=False,
664664
help=(
665-
"Wether to train a DoRA as proposed in- DoRA: Weight-Decomposed Low-Rank Adaptation https://arxiv.org/abs/2402.09353. "
665+
"Whether to train a DoRA as proposed in- DoRA: Weight-Decomposed Low-Rank Adaptation https://arxiv.org/abs/2402.09353. "
666666
"Note: to use DoRA you need to install peft from main, `pip install git+https://github.com/huggingface/peft.git`"
667667
),
668668
)

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -773,7 +773,7 @@ def parse_args(input_args=None):
773773
action="store_true",
774774
default=False,
775775
help=(
776-
"Wether to train a DoRA as proposed in- DoRA: Weight-Decomposed Low-Rank Adaptation https://arxiv.org/abs/2402.09353. "
776+
"Whether to train a DoRA as proposed in- DoRA: Weight-Decomposed Low-Rank Adaptation https://arxiv.org/abs/2402.09353. "
777777
"Note: to use DoRA you need to install peft from main, `pip install git+https://github.com/huggingface/peft.git`"
778778
),
779779
)
@@ -1875,7 +1875,7 @@ def compute_text_embeddings(prompt, text_encoders, tokenizers, clip_skip):
18751875
# pack the statically computed variables appropriately here. This is so that we don't
18761876
# have to pass them to the dataloader.
18771877

1878-
# if --train_text_encoder_ti we need add_special_tokens to be True fo textual inversion
1878+
# if --train_text_encoder_ti we need add_special_tokens to be True for textual inversion
18791879
add_special_tokens = True if args.train_text_encoder_ti else False
18801880

18811881
if not train_dataset.custom_instance_prompts:

src/diffusers/loaders/ip_adapter.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -804,9 +804,7 @@ def load_ip_adapter(
804804
}
805805

806806
self.register_modules(
807-
feature_extractor=SiglipImageProcessor.from_pretrained(image_encoder_subfolder, **kwargs).to(
808-
self.device, dtype=self.dtype
809-
),
807+
feature_extractor=SiglipImageProcessor.from_pretrained(image_encoder_subfolder, **kwargs),
810808
image_encoder=SiglipVisionModel.from_pretrained(
811809
image_encoder_subfolder, torch_dtype=self.dtype, **kwargs
812810
).to(self.device),

src/diffusers/loaders/lora_conversion_utils.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1348,3 +1348,53 @@ def process_block(prefix, index, convert_norm):
13481348
converted_state_dict[f"transformer.{key}"] = converted_state_dict.pop(key)
13491349

13501350
return converted_state_dict
1351+
1352+
1353+
def _convert_non_diffusers_wan_lora_to_diffusers(state_dict):
1354+
converted_state_dict = {}
1355+
original_state_dict = {k[len("diffusion_model.") :]: v for k, v in state_dict.items()}
1356+
1357+
num_blocks = len({k.split("blocks.")[1].split(".")[0] for k in original_state_dict})
1358+
1359+
for i in range(num_blocks):
1360+
# Self-attention
1361+
for o, c in zip(["q", "k", "v", "o"], ["to_q", "to_k", "to_v", "to_out.0"]):
1362+
converted_state_dict[f"blocks.{i}.attn1.{c}.lora_A.weight"] = original_state_dict.pop(
1363+
f"blocks.{i}.self_attn.{o}.lora_A.weight"
1364+
)
1365+
converted_state_dict[f"blocks.{i}.attn1.{c}.lora_B.weight"] = original_state_dict.pop(
1366+
f"blocks.{i}.self_attn.{o}.lora_B.weight"
1367+
)
1368+
1369+
# Cross-attention
1370+
for o, c in zip(["q", "k", "v", "o"], ["to_q", "to_k", "to_v", "to_out.0"]):
1371+
converted_state_dict[f"blocks.{i}.attn2.{c}.lora_A.weight"] = original_state_dict.pop(
1372+
f"blocks.{i}.cross_attn.{o}.lora_A.weight"
1373+
)
1374+
converted_state_dict[f"blocks.{i}.attn2.{c}.lora_B.weight"] = original_state_dict.pop(
1375+
f"blocks.{i}.cross_attn.{o}.lora_B.weight"
1376+
)
1377+
for o, c in zip(["k_img", "v_img"], ["add_k_proj", "add_v_proj"]):
1378+
converted_state_dict[f"blocks.{i}.attn2.{c}.lora_A.weight"] = original_state_dict.pop(
1379+
f"blocks.{i}.cross_attn.{o}.lora_A.weight"
1380+
)
1381+
converted_state_dict[f"blocks.{i}.attn2.{c}.lora_B.weight"] = original_state_dict.pop(
1382+
f"blocks.{i}.cross_attn.{o}.lora_B.weight"
1383+
)
1384+
1385+
# FFN
1386+
for o, c in zip(["ffn.0", "ffn.2"], ["net.0.proj", "net.2"]):
1387+
converted_state_dict[f"blocks.{i}.ffn.{c}.lora_A.weight"] = original_state_dict.pop(
1388+
f"blocks.{i}.{o}.lora_A.weight"
1389+
)
1390+
converted_state_dict[f"blocks.{i}.ffn.{c}.lora_B.weight"] = original_state_dict.pop(
1391+
f"blocks.{i}.{o}.lora_B.weight"
1392+
)
1393+
1394+
if len(original_state_dict) > 0:
1395+
raise ValueError(f"`state_dict` should be empty at this point but has {original_state_dict.keys()=}")
1396+
1397+
for key in list(converted_state_dict.keys()):
1398+
converted_state_dict[f"transformer.{key}"] = converted_state_dict.pop(key)
1399+
1400+
return converted_state_dict

0 commit comments

Comments
 (0)