Skip to content

Commit 428dbfe

Browse files
kfzyqintchambonyiyixuxuyiyixuxupatrickvonplaten
authored
[SDXL and IP2P]: instruction pix2pix XL training and pipeline (#4079)
* Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * [Community] Implementation of the IADB community pipeline (#3996) * community pipeline: implementation of iadb * iadb.py: reformat using black * iadb.py: linting update * add kandinsky to readme table (#4081) Co-authored-by: yiyixuxu <yixu310@gmail,com> * [From Single File] Force accelerate to be installed (#4078) force accelerate to be installed * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Support instruction pix2pix sdxl * Clean up IP2P SDXL code * Clean up IP2P SDXL code * [IP2P and SDXL] clean up code * [IP2P and SDXL] clean up code * [IP2P and SDXL] clean up code * [IP2P SDXL] Address code reviews * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews, add docs, tests * [IP2P SDXL] Address code reviews * [IP2P SDXL] Address code reviews * [IP2P SDXL] Add README_SDXL * [IP2P SDXL] Address code reviews * [IP2P SDXL] Address code reviews * [IP2P SDXL] Fix the copy problems * [IP2P SDXL] Add license * [IP2P SDXL] Add license * [IP2P SDXL] Add license * [IP2P SDXL] Address code reivew for selecting VAE andd others * [IP2P SDXL] Update README_sdxl * [IP2P SDXL] Update __init__ * [IP2P SDXL] Update dummy_torch_and_transformers_and_invisible_watermark_objects * address patrick's comments and some additions to readmes. --------- Co-authored-by: Harutatsu Akiyama <[email protected]> Co-authored-by: Thomas Chambon <[email protected]> Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Sayak Paul <[email protected]>
1 parent 4e2a021 commit 428dbfe

File tree

10 files changed

+2473
-2
lines changed

10 files changed

+2473
-2
lines changed

docs/source/en/training/instructpix2pix.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,4 +208,8 @@ speed and quality during performance:
208208
Particularly, `image_guidance_scale` and `guidance_scale` can have a profound impact
209209
on the generated ("edited") image (see [here](https://twitter.com/RisingSayak/status/1628392199196151808?s=20) for an example).
210210

211-
If you're looking for some interesting ways to use the InstructPix2Pix training methodology, we welcome you to check out this blog post: [Instruction-tuning Stable Diffusion with InstructPix2Pix](https://huggingface.co/blog/instruction-tuning-sd).
211+
If you're looking for some interesting ways to use the InstructPix2Pix training methodology, we welcome you to check out this blog post: [Instruction-tuning Stable Diffusion with InstructPix2Pix](https://huggingface.co/blog/instruction-tuning-sd).
212+
213+
## Stable Diffusion XL
214+
215+
We support fine-tuning of the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with DreamBooth and LoRA via the `train_dreambooth_lora_sdxl.py` script. Please refer to the docs [here](https://github.com/huggingface/diffusers/blob/main/examples/instruct_pix2pix/README_sdxl.md).

examples/instruct_pix2pix/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,4 +186,8 @@ speed and quality during performance:
186186
Particularly, `image_guidance_scale` and `guidance_scale` can have a profound impact
187187
on the generated ("edited") image (see [here](https://twitter.com/RisingSayak/status/1628392199196151808?s=20) for an example).
188188

189-
If you're looking for some interesting ways to use the InstructPix2Pix training methodology, we welcome you to check out this blog post: [Instruction-tuning Stable Diffusion with InstructPix2Pix](https://huggingface.co/blog/instruction-tuning-sd).
189+
If you're looking for some interesting ways to use the InstructPix2Pix training methodology, we welcome you to check out this blog post: [Instruction-tuning Stable Diffusion with InstructPix2Pix](https://huggingface.co/blog/instruction-tuning-sd).
190+
191+
## Stable Diffusion XL
192+
193+
We support fine-tuning of the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with DreamBooth and LoRA via the `train_dreambooth_lora_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md).
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# InstructPix2Pix SDXL training example
2+
3+
***This is based on the original InstructPix2Pix training example.***
4+
5+
[Stable Diffusion XL](https://huggingface.co/papers/2307.01952) (or SDXL) is the latest image generation model that is tailored towards more photorealistic outputs with more detailed imagery and composition compared to previous SD models. It leverages a three times larger UNet backbone. The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder.
6+
7+
The `train_instruct_pix2pix_xl.py` script shows how to implement the training procedure and adapt it for Stable Diffusion XL.
8+
9+
***Disclaimer: Even though `train_instruct_pix2pix_xl.py` implements the InstructPix2Pix
10+
training procedure while being faithful to the [original implementation](https://github.com/timothybrooks/instruct-pix2pix) we have only tested it on a [small-scale dataset](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples). This can impact the end results. For better results, we recommend longer training runs with a larger dataset. [Here](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered) you can find a large dataset for InstructPix2Pix training.***
11+
12+
## Running locally with PyTorch
13+
14+
### Installing the dependencies
15+
16+
Refer to the original InstructPix2Pix training example for installing the dependencies.
17+
18+
You will also need to get access of SDXL by filling the [form](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9).
19+
20+
### Toy example
21+
22+
As mentioned before, we'll use a [small toy dataset](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples) for training. The dataset
23+
is a smaller version of the [original dataset](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered) used in the InstructPix2Pix paper.
24+
25+
Configure environment variables such as the dataset identifier and the Stable Diffusion
26+
checkpoint:
27+
28+
```bash
29+
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-0.9"
30+
export DATASET_ID="fusing/instructpix2pix-1000-samples"
31+
```
32+
33+
Now, we can launch training:
34+
35+
```bash
36+
python train_instruct_pix2pix_xl.py \
37+
--pretrained_model_name_or_path=$MODEL_NAME \
38+
--dataset_name=$DATASET_ID \
39+
--enable_xformers_memory_efficient_attention \
40+
--resolution=256 --random_flip \
41+
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
42+
--max_train_steps=15000 \
43+
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
44+
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \
45+
--conditioning_dropout_prob=0.05 \
46+
--seed=42
47+
```
48+
49+
Additionally, we support performing validation inference to monitor training progress
50+
with Weights and Biases. You can enable this feature with `report_to="wandb"`:
51+
52+
```bash
53+
python train_instruct_pix2pix_xl.py \
54+
--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-0.9 \
55+
--dataset_name=$DATASET_ID \
56+
--use_ema \
57+
--enable_xformers_memory_efficient_attention \
58+
--resolution=512 --random_flip \
59+
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
60+
--max_train_steps=15000 \
61+
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
62+
--learning_rate=5e-05 --lr_warmup_steps=0 \
63+
--conditioning_dropout_prob=0.05 \
64+
--seed=42 \
65+
--val_image_url_or_path="https://datasets-server.huggingface.co/assets/fusing/instructpix2pix-1000-samples/--/fusing--instructpix2pix-1000-samples/train/23/input_image/image.jpg" \
66+
--validation_prompt="make it in japan" \
67+
--report_to=wandb
68+
```
69+
70+
We recommend this type of validation as it can be useful for model debugging. Note that you need `wandb` installed to use this. You can install `wandb` by running `pip install wandb`.
71+
72+
[Here](https://wandb.ai/sayakpaul/instruct-pix2pix/runs/ctr3kovq), you can find an example training run that includes some validation samples and the training hyperparameters.
73+
74+
***Note: In the original paper, the authors observed that even when the model is trained with an image resolution of 256x256, it generalizes well to bigger resolutions such as 512x512. This is likely because of the larger dataset they used during training.***
75+
76+
## Training with multiple GPUs
77+
78+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
79+
for running distributed training with `accelerate`. Here is an example command:
80+
81+
```bash
82+
accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \
83+
--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-0.9 \
84+
--dataset_name=$DATASET_ID \
85+
--use_ema \
86+
--enable_xformers_memory_efficient_attention \
87+
--resolution=512 --random_flip \
88+
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
89+
--max_train_steps=15000 \
90+
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
91+
--learning_rate=5e-05 --lr_warmup_steps=0 \
92+
--conditioning_dropout_prob=0.05 \
93+
--seed=42 \
94+
--val_image_url_or_path="https://datasets-server.huggingface.co/assets/fusing/instructpix2pix-1000-samples/--/fusing--instructpix2pix-1000-samples/train/23/input_image/image.jpg" \
95+
--validation_prompt="make it in japan" \
96+
--report_to=wandb
97+
```
98+
99+
## Inference
100+
101+
Once training is complete, we can perform inference:
102+
103+
```python
104+
import PIL
105+
import requests
106+
import torch
107+
from diffusers import StableDiffusionXLInstructPix2PixPipeline
108+
109+
model_id = "your_model_id" # <- replace this
110+
pipe = StableDiffusionXLInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
111+
generator = torch.Generator("cuda").manual_seed(0)
112+
113+
url = "https://datasets-server.huggingface.co/assets/fusing/instructpix2pix-1000-samples/--/fusing--instructpix2pix-1000-samples/train/23/input_image/image.jpg"
114+
115+
116+
def download_image(url):
117+
image = PIL.Image.open(requests.get(url, stream=True).raw)
118+
image = PIL.ImageOps.exif_transpose(image)
119+
image = image.convert("RGB")
120+
return image
121+
122+
image = download_image(url)
123+
prompt = "make it Japan"
124+
num_inference_steps = 20
125+
image_guidance_scale = 1.5
126+
guidance_scale = 10
127+
128+
edited_image = pipe(prompt,
129+
image=image,
130+
num_inference_steps=num_inference_steps,
131+
image_guidance_scale=image_guidance_scale,
132+
guidance_scale=guidance_scale,
133+
generator=generator,
134+
).images[0]
135+
edited_image.save("edited_image.png")
136+
```
137+
138+
We encourage you to play with the following three parameters to control
139+
speed and quality during performance:
140+
141+
* `num_inference_steps`
142+
* `image_guidance_scale`
143+
* `guidance_scale`
144+
145+
Particularly, `image_guidance_scale` and `guidance_scale` can have a profound impact
146+
on the generated ("edited") image (see [here](https://twitter.com/RisingSayak/status/1628392199196151808?s=20) for an example).
147+
148+
If you're looking for some interesting ways to use the InstructPix2Pix training methodology, we welcome you to check out this blog post: [Instruction-tuning Stable Diffusion with InstructPix2Pix](https://huggingface.co/blog/instruction-tuning-sd).

0 commit comments

Comments
 (0)