|
5 | 5 | > 💡 This example follows some of the techniques and recommended practices covered in the community derived guide we made for SDXL training: [LoRA training scripts of the world, unite!](https://huggingface.co/blog/sdxl_lora_advanced_script). |
6 | 6 | > As many of these are architecture agnostic & generally relevant to fine-tuning of diffusion models we suggest to take a look 🤗 |
7 | 7 |
|
8 | | -[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text2image models like flux, stable diffusion given just a few(3~5) images of a subject. |
| 8 | +[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like flux, stable diffusion given just a few(3~5) images of a subject. |
9 | 9 |
|
10 | 10 | LoRA - Low-Rank Adaption of Large Language Models, was first introduced by Microsoft in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) by *Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen* |
11 | 11 | In a nutshell, LoRA allows to adapt pretrained models by adding pairs of rank-decomposition matrices to existing weights and **only** training those newly added weights. This has a couple of advantages: |
@@ -65,6 +65,21 @@ write_basic_config() |
65 | 65 | When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. |
66 | 66 | Note also that we use PEFT library as backend for LoRA training, make sure to have `peft>=0.6.0` installed in your environment. |
67 | 67 |
|
| 68 | +### Target Modules |
| 69 | +When LoRA was first adapted from language models to diffusion models, it was applied to the cross-attention layers in the Unet that relate the image representations with the prompts that describe them. |
| 70 | +More recently, SOTA text-to-image diffusion models replaced the Unet with a diffusion Transformer(DiT). With this change, we may also want to explore |
| 71 | +applying LoRA training onto different types of layers and blocks. To allow more flexibility and control over the targeted modules we added `--lora_layers`- in which you can specify in a comma seperated string |
| 72 | +the exact modules for LoRA training. Here are some examples of target modules you can provide: |
| 73 | +- for attention only layers: `--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0"` |
| 74 | +- to train the same modules as in the fal trainer: `--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2"` |
| 75 | +- to train the same modules as in ostris ai-toolkit / replicate trainer: `--lora_blocks="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2,norm1_context.linear, norm1.linear,norm.linear,proj_mlp,proj_out"` |
| 76 | +> [!NOTE] |
| 77 | +> `--lora_layers` can also be used to specify which **blocks** to apply LoRA training to. To do so, simply add a block prefix to each layer in the comma seperated string: |
| 78 | +> **single DiT blocks**: to target the ith single transformer block, add the prefix `single_transformer_blocks.i`, e.g. - `single_transformer_blocks.i.attn.to_k` |
| 79 | +> **MMDiT blocks**: to target the ith MMDiT block, add the prefix `transformer_blocks.i`, e.g. - `transformer_blocks.i.attn.to_k` |
| 80 | +> [!NOTE] |
| 81 | +> keep in mind that while training more layers can improve quality and expressiveness, it also increases the size of the output LoRA weights. |
| 82 | +
|
68 | 83 | ### Pivotal Tuning (and more) |
69 | 84 | **Training with text encoder(s)** |
70 | 85 |
|
|
0 commit comments