You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/community/README.md
+92Lines changed: 92 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,6 +73,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
73
73
| Stable Diffusion BoxDiff Pipeline | Training-free controlled generation with bounding boxes using [BoxDiff](https://github.com/showlab/BoxDiff)|[Stable Diffusion BoxDiff Pipeline](#stable-diffusion-boxdiff)| - |[Jingyang Zhang](https://github.com/zjysteven/)|
74
74
| FRESCO V2V Pipeline | Implementation of [[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation](https://arxiv.org/abs/2403.12962)|[FRESCO V2V Pipeline](#fresco)| - |[Yifan Zhou](https://github.com/SingleZombie)|
75
75
| AnimateDiff IPEX Pipeline | Accelerate AnimateDiff inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch)|[AnimateDiff on IPEX](#animatediff-on-ipex)| - |[Dan Li](https://github.com/ustcuna/)|
76
+
PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixart alpha and its diffusers pipeline | [PIXART-α Controlnet pipeline](#pixart-α-controlnet-pipeline) | - | [Raul Ciotescu](https://github.com/raulc0399/) |
76
77
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffusion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). |[HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion)|[](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing)|[Monjoy Choudhury](https://github.com/MnCSSJ4x)|
77
78
|[🪆Matryoshka Diffusion Models](https://huggingface.co/papers/2310.15111)| A diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small scale inputs are nested within those of the large scales. See [original codebase](https://github.com/apple/ml-mdm). |[🪆Matryoshka Diffusion Models](#matryoshka-diffusion-models)|[](https://huggingface.co/spaces/pcuenq/mdm)[](https://colab.research.google.com/gist/tolgacangoz/1f54875fc7aeaabcf284ebde64820966/matryoshka_hf.ipynb)|[M. Tolga Cangöz](https://github.com/tolgacangoz)|
When LoRA was first adapted from language models to diffusion models, it was applied to the cross-attention layers in the Unet that relate the image representations with the prompts that describe them.
175
+
More recently, SOTA text-to-image diffusion models replaced the Unet with a diffusion Transformer(DiT). With this change, we may also want to explore
176
+
applying LoRA training onto different types of layers and blocks. To allow more flexibility and control over the targeted modules we added `--lora_layers`- in which you can specify in a comma seperated string
177
+
the exact modules for LoRA training. Here are some examples of target modules you can provide:
178
+
- for attention only layers: `--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0"`
179
+
- to train the same modules as in the fal trainer: `--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2"`
180
+
- to train the same modules as in ostris ai-toolkit / replicate trainer: `--lora_blocks="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2,norm1_context.linear, norm1.linear,norm.linear,proj_mlp,proj_out"`
181
+
> [!NOTE]
182
+
> `--lora_layers` can also be used to specify which **blocks** to apply LoRA training to. To do so, simply add a block prefix to each layer in the comma seperated string:
183
+
> **single DiT blocks**: to target the ith single transformer block, add the prefix `single_transformer_blocks.i`, e.g. - `single_transformer_blocks.i.attn.to_k`
184
+
> **MMDiT blocks**: to target the ith MMDiT block, add the prefix `transformer_blocks.i`, e.g. - `transformer_blocks.i.attn.to_k`
185
+
> [!NOTE]
186
+
> keep in mind that while training more layers can improve quality and expressiveness, it also increases the size of the output LoRA weights.
187
+
173
188
### Text Encoder Training
174
189
175
190
Alongside the transformer, fine-tuning of the CLIP text encoder is also supported.
As image generation models get bigger & more powerful, more fine-tuners come to find that training only part of the
152
+
transformer blocks (sometimes as little as two) can be enough to get great results.
153
+
In some cases, it can be even better to maintain some of the blocks/layers frozen.
154
+
155
+
For **SD3.5-Large** specifically, you may find this information useful (taken from: [Stable Diffusion 3.5 Large Fine-tuning Tutorial](https://stabilityai.notion.site/Stable-Diffusion-3-5-Large-Fine-tuning-Tutorial-11a61cdcd1968027a15bdbd7c40be8c6#12461cdcd19680788a23c650dab26b93):
156
+
> [!NOTE]
157
+
> A commonly believed heuristic that we verified once again during the construction of the SD3.5 family of models is that later/higher layers (i.e. `30 - 37`)* impact tertiary details more heavily. Conversely, earlier layers (i.e. `12 - 24` )* influence the overall composition/primary form more.
158
+
> So, freezing other layers/targeting specific layers is a viable approach.
159
+
> `*`These suggested layers are speculative and not 100% guaranteed. The tips here are more or less a general idea for next steps.
160
+
> **Photorealism**
161
+
> In preliminary testing, we observed that freezing the last few layers of the architecture significantly improved model training when using a photorealistic dataset, preventing detail degradation introduced by small dataset from happening.
162
+
> **Anatomy preservation**
163
+
> To dampen any possible degradation of anatomy, training only the attention layers and **not** the adaptive linear layers could help. For reference, below is one of the transformer blocks.
164
+
165
+
166
+
We've added `--lora_layers` and `--lora_blocks` to make LoRA training modules configurable.
167
+
- with `--lora_blocks` you can specify the block numbers for training. E.g. passing -
0 commit comments