You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/training/dreambooth.md
+192Lines changed: 192 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -440,6 +440,198 @@ Stable Diffusion XL (SDXL) is a powerful text-to-image model that generates high
440
440
441
441
The SDXL training script is discussed in more detail in the [SDXL training](sdxl) guide.
442
442
443
+
## DeepFloyd IF
444
+
445
+
DeepFloyd IF is a cascading pixel diffusion model with three stages. The first stage generates a base image and the second and third stages progressively upscales the base image into a high-resolution 1024x1024 image. Use the [train_dreambooth_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py) or [train_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) scripts to train a DeepFloyd IF model with LoRA or the full model.
446
+
447
+
DeepFloyd IF uses predicted variance, but the Diffusers training scripts uses predicted error so the trained DeepFloyd IF models are switched to a fixed variance schedule. The training scripts will update the scheduler config of the fully trained model for you. However, when you load the saved LoRA weights you must also update the pipeline's scheduler config.
The stage 2 model requires additional validation images to upscale. You can download and use a downsized version of the training images for this.
461
+
462
+
```py
463
+
from huggingface_hub import snapshot_download
464
+
465
+
local_dir ="./dog_downsized"
466
+
snapshot_download(
467
+
"diffusers/dog-example-downsized",
468
+
local_dir=local_dir,
469
+
repo_type="dataset",
470
+
ignore_patterns=".gitattributes",
471
+
)
472
+
```
473
+
474
+
The code samples below provide a brief overview of how to train a DeepFloyd IF model with a combination of DreamBooth and LoRA. Some important parameters to note are:
475
+
476
+
*`--resolution=64`, a much smaller resolution is required because DeepFloyd IF is a pixel diffusion model and to work on uncompressed pixels, the input images must be smaller
477
+
*`--pre_compute_text_embeddings`, compute the text embeddings ahead of time to save memory because the [`~transformers.T5Model`] can take up a lot of memory
478
+
*`--tokenizer_max_length=77`, you can use a longer default text length with T5 as the text encoder but the default model encoding procedure uses a shorter text length
479
+
*`--text_encoder_use_attention_mask`, to pass the attention mask to the text encoder
480
+
481
+
<hfoptionsid="IF-DreamBooth">
482
+
<hfoptionid="Stage 1 LoRA DreamBooth">
483
+
484
+
Training stage 1 of DeepFloyd IF with LoRA and DreamBooth requires ~28GB of memory.
485
+
486
+
```bash
487
+
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
488
+
export INSTANCE_DIR="dog"
489
+
export OUTPUT_DIR="dreambooth_dog_lora"
490
+
491
+
accelerate launch train_dreambooth_lora.py \
492
+
--report_to wandb \
493
+
--pretrained_model_name_or_path=$MODEL_NAME \
494
+
--instance_data_dir=$INSTANCE_DIR \
495
+
--output_dir=$OUTPUT_DIR \
496
+
--instance_prompt="a sks dog" \
497
+
--resolution=64 \
498
+
--train_batch_size=4 \
499
+
--gradient_accumulation_steps=1 \
500
+
--learning_rate=5e-6 \
501
+
--scale_lr \
502
+
--max_train_steps=1200 \
503
+
--validation_prompt="a sks dog" \
504
+
--validation_epochs=25 \
505
+
--checkpointing_steps=100 \
506
+
--pre_compute_text_embeddings \
507
+
--tokenizer_max_length=77 \
508
+
--text_encoder_use_attention_mask
509
+
```
510
+
511
+
</hfoption>
512
+
<hfoptionid="Stage 2 LoRA DreamBooth">
513
+
514
+
For stage 2 of DeepFloyd IF with LoRA and DreamBooth, pay attention to these parameters:
515
+
516
+
*`--validation_images`, the images to upscale during validation
517
+
*`--class_labels_conditioning=timesteps`, to additionally conditional the UNet as needed in stage 2
518
+
*`--learning_rate=1e-6`, a lower learning rate is used compared to stage 1
519
+
*`--resolution=256`, the expected resolution for the upscaler
For stage 1 of DeepFloyd IF with DreamBooth, pay attention to these parameters:
552
+
553
+
*`--skip_save_text_encoder`, to skip saving the full T5 text encoder with the finetuned model
554
+
*`--use_8bit_adam`, to use 8-bit Adam optimizer to save memory due to the size of the optimizer state when training the full model
555
+
*`--learning_rate=1e-7`, a really low learning rate should be used for full model training otherwise the model quality is degraded (you can use a higher learning rate with a larger batch size)
556
+
557
+
Training with 8-bit Adam and a batch size of 4, the full model can be trained with ~48GB of memory.
558
+
559
+
```bash
560
+
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
561
+
export INSTANCE_DIR="dog"
562
+
export OUTPUT_DIR="dreambooth_if"
563
+
564
+
accelerate launch train_dreambooth.py \
565
+
--pretrained_model_name_or_path=$MODEL_NAME \
566
+
--instance_data_dir=$INSTANCE_DIR \
567
+
--output_dir=$OUTPUT_DIR \
568
+
--instance_prompt="a photo of sks dog" \
569
+
--resolution=64 \
570
+
--train_batch_size=4 \
571
+
--gradient_accumulation_steps=1 \
572
+
--learning_rate=1e-7 \
573
+
--max_train_steps=150 \
574
+
--validation_prompt "a photo of sks dog" \
575
+
--validation_steps 25 \
576
+
--text_encoder_use_attention_mask \
577
+
--tokenizer_max_length 77 \
578
+
--pre_compute_text_embeddings \
579
+
--use_8bit_adam \
580
+
--set_grads_to_none \
581
+
--skip_save_text_encoder \
582
+
--push_to_hub
583
+
```
584
+
585
+
</hfoption>
586
+
<hfoptionid="Stage 2 DreamBooth">
587
+
588
+
For stage 2 of DeepFloyd IF with DreamBooth, pay attention to these parameters:
589
+
590
+
*`--learning_rate=5e-6`, use a lower learning rate with a smaller effective batch size
591
+
*`--resolution=256`, the expected resolution for the upscaler
592
+
*`--train_batch_size=2` and `--gradient_accumulation_steps=6`, to effectively train on images wiht faces requires larger batch sizes
Training the DeepFloyd IF model can be challenging, but here are some tips that we've found helpful:
628
+
629
+
- LoRA is sufficient for training the stage 1 model because the model's low resolution makes representing finer details difficult regardless.
630
+
- For common or simple objects, you don't necessarily need to finetune the upscaler. Make sure the prompt passed to the upscaler is adjusted to remove the new token from the instance prompt. For example, if your stage 1 prompt is "a sks dog" then your stage 2 prompt should be "a dog".
631
+
- For finer details like faces, fully training the stage 2 upscaler is better than training the stage 2 model with LoRA. It also helps to use lower learning rates with larger batch sizes.
632
+
- Lower learning rates should be used to train the stage 2 model.
633
+
- The [`DDPMScheduler`] works better than the DPMSolver used in the training scripts.
634
+
443
635
## Next steps
444
636
445
637
Congratulations on training your DreamBooth model! To learn more about how to use your new model, the following guide may be helpful:
0 commit comments