Skip to content

Commit 02016ca

Browse files
committed
readme
1 parent f1d9550 commit 02016ca

File tree

1 file changed

+91
-6
lines changed

1 file changed

+91
-6
lines changed

examples/flux-control/README.md

Lines changed: 91 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
1-
# Training Control LoRA with Flux
1+
# Training Flux Control
22

3-
This (experimental) example shows how to train Control LoRAs with [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev) by conditioning it with additional structural controls (like depth maps, poses, etc.). We provide a script for full fine-tuning, too, refer to [this section](#full-fine-tuning).
3+
This (experimental) example shows how to train Control LoRAs with [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev) by conditioning it with additional structural controls (like depth maps, poses, etc.). We provide a script for full fine-tuning, too, refer to [this section](#full-fine-tuning). To know more about Flux Control family, refer to the following resources:
4+
5+
* [Docs](https://github.com/black-forest-labs/flux/blob/main/docs/structural-conditioning.md) by Black Forest Labs
6+
* Diffusers docs ([1](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#canny-control), [2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#depth-control))
47

58
To incorporate additional condition latents, we expand the input features of Flux.1-Dev from 64 to 128. The first 64 channels correspond to the original input latents to be denoised, while the latter 64 channels correspond to control latents. This expansion happens on the `x_embedder` layer, where the combined latents are projected to the expected feature dimension of rest of the network. Inference is performed using the `FluxControlPipeline`.
69

@@ -13,7 +16,7 @@ To incorporate additional condition latents, we expand the input features of Flu
1316
huggingface-cli login
1417
```
1518

16-
Example command:
19+
The example command below shows how to launch fine-tuning for pose conditions. The dataset ([`raulc0399/open_pose_controlnet`](https://huggingface.co/datasets/raulc0399/open_pose_controlnet)) being used here already has the pose conditions of the original images, so we don't have to compute them.
1720

1821
```bash
1922
accelerate launch train_control_lora_flux.py \
@@ -47,7 +50,7 @@ The training script exposes additional CLI args that might be useful to experime
4750
* `train_norm_layers`: When set, additionally trains the normalization scales. Takes care of saving and loading.
4851
* `lora_layers`: Specify the layers you want to apply LoRA to. If you specify "all-linear", all the linear layers will be LoRA-attached.
4952

50-
## Training with DeepSpeed
53+
### Training with DeepSpeed
5154

5255
It's possible to train with [DeepSpeed](https://github.com/microsoft/DeepSpeed), specifically leveraging the Zero2 system optimization. To use it, save the following config to an YAML file (feel free to modify as needed):
5356

@@ -83,6 +86,48 @@ And then while launching training, pass the config file:
8386
accelerate launch --config_file=CONFIG_FILE.yaml ...
8487
```
8588

89+
### Inference
90+
91+
The pose images in our dataset were computed using the [`controlnet_aux`](https://github.com/huggingface/controlnet_aux) library. Let's install it first:
92+
93+
```bash
94+
pip install controlnet_aux
95+
```
96+
97+
And then we are ready:
98+
99+
```py
100+
from controlnet_aux import OpenposeDetector
101+
from diffusers import FluxControlPipeline
102+
from diffusers.utils import load_image
103+
from PIL import Image
104+
import numpy as np
105+
import torch
106+
107+
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
108+
pipe.load_lora_weights("...") # change this.
109+
110+
open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
111+
112+
# prepare pose condition.
113+
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/people.jpg"
114+
image = load_image(url)
115+
image = open_pose(image, detect_resolution=512, image_resolution=1024)
116+
image = np.array(image)[:, :, ::-1]
117+
image = Image.fromarray(np.uint8(image))
118+
119+
prompt = "A couple, 4k photo, highly detailed"
120+
121+
gen_images = pipe(
122+
prompt=prompt,
123+
condition_image=image,
124+
num_inference_steps=50,
125+
joint_attention_kwargs={"scale": 0.9},
126+
guidance_scale=25.,
127+
).images[0]
128+
gen_images.save("output.png")
129+
```
130+
86131
## Full fine-tuning
87132

88133
We provide a non-LoRA version of the training script `train_control_flux.py`. Here is an example command:
@@ -101,7 +146,6 @@ accelerate launch --config_file=accelerate_ds2.yaml train_control_flux.py \
101146
--proportion_empty_prompts=0.2 \
102147
--learning_rate=5e-5 \
103148
--adam_weight_decay=1e-4 \
104-
--set_grads_to_none \
105149
--report_to="wandb" \
106150
--lr_scheduler="cosine" \
107151
--lr_warmup_steps=1000 \
@@ -114,4 +158,45 @@ accelerate launch --config_file=accelerate_ds2.yaml train_control_flux.py \
114158
--push_to_hub
115159
```
116160

117-
Change the `validation_image` and `validation_prompt` as needed.
161+
Change the `validation_image` and `validation_prompt` as needed.
162+
163+
For inference, this time, we will run:
164+
165+
```py
166+
from controlnet_aux import OpenposeDetector
167+
from diffusers import FluxControlPipeline, FluxTransformer2DModel
168+
from diffusers.utils import load_image
169+
from PIL import Image
170+
import numpy as np
171+
import torch
172+
173+
transformer = FluxTransformer2DModel.from_pretrained("...") # change this.
174+
pipe = FluxControlPipeline.from_pretrained(
175+
"black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
176+
).to("cuda")
177+
178+
open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
179+
180+
# prepare pose condition.
181+
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/people.jpg"
182+
image = load_image(url)
183+
image = open_pose(image, detect_resolution=512, image_resolution=1024)
184+
image = np.array(image)[:, :, ::-1]
185+
image = Image.fromarray(np.uint8(image))
186+
187+
prompt = "A couple, 4k photo, highly detailed"
188+
189+
gen_images = pipe(
190+
prompt=prompt,
191+
condition_image=image,
192+
num_inference_steps=50,
193+
guidance_scale=25.,
194+
).images[0]
195+
gen_images.save("output.png")
196+
```
197+
198+
## Things to note
199+
200+
* The scripts provided in this directory are experimental and educational. This means we may have to tweak things around to get good results on a given condition. We believe this is best done with the community 🤗
201+
* The scripts are not memory-optimized but we offload the VAE and the text encoders to CPU when they are not used.
202+
* We can extract LoRAs from the fully fine-tuned model. While we currently don't provide any utilities for that, users are welcome to refer to [this script](https://github.com/Stability-AI/stability-ComfyUI-nodes/blob/master/control_lora_create.py) that provides a similar functionality.

0 commit comments

Comments
 (0)