Skip to content

Commit 3dd4168

Browse files
authored
[docs] Minor updates (#7063)
* updates * feedback
1 parent 1c47d1f commit 3dd4168

File tree

5 files changed

+52
-29
lines changed

5 files changed

+52
-29
lines changed

docs/source/en/api/attnprocessor.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,6 @@ An attention processor is a class for applying different types of attention mech
4141
## FusedAttnProcessor2_0
4242
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
4343

44-
## LoRAAttnProcessor
45-
[[autodoc]] models.attention_processor.LoRAAttnProcessor
46-
47-
## LoRAAttnProcessor2_0
48-
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
49-
5044
## LoRAAttnAddedKVProcessor
5145
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
5246

docs/source/en/optimization/fp16.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,9 @@ image = pipe(prompt).images[0]
6666
Don't use [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) in any of the pipelines as it can lead to black images and is always slower than pure float16 precision.
6767

6868
</Tip>
69+
70+
## Distilled model
71+
72+
You could also use a distilled Stable Diffusion model and autoencoder to speed up inference. During distillation, many of the UNet's residual and attention blocks are shed to reduce the model size. The distilled model is faster and uses less memory while generating images of comparable quality to the full Stable Diffusion model.
73+
74+
Learn more about in the [Distilled Stable Diffusion inference](../using-diffusers/distilled_sd) guide!

docs/source/en/optimization/torch2.0.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,9 @@ Compilation requires some time to complete, so it is best suited for situations
7575

7676
For more information and different options about `torch.compile`, refer to the [`torch_compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) tutorial.
7777

78+
> [!TIP]
79+
> Learn more about other ways PyTorch 2.0 can help optimize your model in the [Accelerate inference of text-to-image diffusion models](../tutorials/fast_diffusion) tutorial.
80+
7881
## Benchmark
7982

8083
We conducted a comprehensive benchmark with PyTorch 2.0's efficient attention implementation and `torch.compile` across different GPUs and batch sizes for five of our most used pipelines. The code is benchmarked on 🤗 Diffusers v0.17.0.dev0 to optimize `torch.compile` usage (see [here](https://github.com/huggingface/diffusers/pull/3313) for more details).

docs/source/en/training/lora.md

Lines changed: 37 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -113,36 +113,50 @@ The dataset preprocessing code and training loop are found in the [`main()`](htt
113113

114114
As with the script parameters, a walkthrough of the training script is provided in the [Text-to-image](text2image#training-script) training guide. Instead, this guide takes a look at the LoRA relevant parts of the script.
115115

116-
The script begins by adding the [new LoRA weights](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L447) to the attention layers. This involves correctly configuring the weight size for each block in the UNet. You'll see the `rank` parameter is used to create the [`~models.attention_processor.LoRAAttnProcessor`]:
116+
<hfoptions id="lora">
117+
<hfoption id="UNet">
118+
119+
Diffusers uses [`~peft.LoraConfig`] from the [PEFT](https://hf.co/docs/peft) library to set up the parameters of the LoRA adapter such as the rank, alpha, and which modules to insert the LoRA weights into. The adapter is added to the UNet, and only the LoRA layers are filtered for optimization in `lora_layers`.
120+
121+
```py
122+
unet_lora_config = LoraConfig(
123+
r=args.rank,
124+
lora_alpha=args.rank,
125+
init_lora_weights="gaussian",
126+
target_modules=["to_k", "to_q", "to_v", "to_out.0"],
127+
)
128+
129+
unet.add_adapter(unet_lora_config)
130+
lora_layers = filter(lambda p: p.requires_grad, unet.parameters())
131+
```
132+
133+
</hfoption>
134+
<hfoption id="text encoder">
135+
136+
Diffusers also supports finetuning the text encoder with LoRA from the [PEFT](https://hf.co/docs/peft) library when necessary such as finetuning Stable Diffusion XL (SDXL). The [`~peft.LoraConfig`] is used to configure the parameters of the LoRA adapter which are then added to the text encoder, and only the LoRA layers are filtered for training.
117137

118138
```py
119-
lora_attn_procs = {}
120-
for name in unet.attn_processors.keys():
121-
cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim
122-
if name.startswith("mid_block"):
123-
hidden_size = unet.config.block_out_channels[-1]
124-
elif name.startswith("up_blocks"):
125-
block_id = int(name[len("up_blocks.")])
126-
hidden_size = list(reversed(unet.config.block_out_channels))[block_id]
127-
elif name.startswith("down_blocks"):
128-
block_id = int(name[len("down_blocks.")])
129-
hidden_size = unet.config.block_out_channels[block_id]
130-
131-
lora_attn_procs[name] = LoRAAttnProcessor(
132-
hidden_size=hidden_size,
133-
cross_attention_dim=cross_attention_dim,
134-
rank=args.rank,
135-
)
136-
137-
unet.set_attn_processor(lora_attn_procs)
138-
lora_layers = AttnProcsLayers(unet.attn_processors)
139+
text_lora_config = LoraConfig(
140+
r=args.rank,
141+
lora_alpha=args.rank,
142+
init_lora_weights="gaussian",
143+
target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
144+
)
145+
146+
text_encoder_one.add_adapter(text_lora_config)
147+
text_encoder_two.add_adapter(text_lora_config)
148+
text_lora_parameters_one = list(filter(lambda p: p.requires_grad, text_encoder_one.parameters()))
149+
text_lora_parameters_two = list(filter(lambda p: p.requires_grad, text_encoder_two.parameters()))
139150
```
140151

141-
The [optimizer](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L519) is initialized with the `lora_layers` because these are the only weights that'll be optimized:
152+
</hfoption>
153+
</hfoptions>
154+
155+
The [optimizer](https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/examples/text_to_image/train_text_to_image_lora.py#L529) is initialized with the `lora_layers` because these are the only weights that'll be optimized:
142156

143157
```py
144158
optimizer = optimizer_cls(
145-
lora_layers.parameters(),
159+
lora_layers,
146160
lr=args.learning_rate,
147161
betas=(args.adam_beta1, args.adam_beta2),
148162
weight_decay=args.adam_weight_decay,

docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,3 +217,9 @@ Check your image dimensions to see if they're correct:
217217
images.shape
218218
# (8, 1, 512, 512, 3)
219219
```
220+
221+
## Resources
222+
223+
To learn more about how JAX works with Stable Diffusion, you may be interested in reading:
224+
225+
* [Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e](https://hf.co/blog/sdxl_jax)

0 commit comments

Comments
 (0)