The train_controlnet_flux.py script shows how to implement the ControlNet training procedure and adapt it for FLUX.
Note
Memory consumption
Flux can be quite expensive to run on consumer hardware devices and as a result, ControlNet training of it comes with higher memory requirements than usual.
Gated access
As the model is gated, before using it with diffusers you first need to go to the FLUX.1 [dev] Hugging Face page, fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in:
huggingface-cli login
Before running the scripts, make sure to install the library's training dependencies:
Important
To make sure you can successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
git clone https://github.com/mindspore-lab/mindone
cd mindone
pip install -e ".[training]"We support dataset formats:
The original dataset is hosted in the ControlNet repo. We re-uploaded it to be compatible with datasets here. Note that datasets handles dataloading within the training script. To use our example, add --dataset_name=fusing/fill50k \ to the script and remove line --jsonl_for_train mentioned below.
We also support importing data from jsonl(xxx.jsonl),using --jsonl_for_train to enable it, here is a brief example of jsonl files:
{"image": "xxx", "text": "xxx", "conditioning_image": "xxx"}
{"image": "xxx", "text": "xxx", "conditioning_image": "xxx"}Our experiments were conducted on a single 64GB Ascend Atlas 800T A2 NPU.
We can define the num_layers, num_single_layers, which determines the size of the control.
python train_controlnet_flux.py \
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
--dataset_name=fusing/fill50k \
--conditioning_image_column=conditioning_image \
--image_column=image \
--caption_column=text \
--dataloader_num_workers=8 \
--output_dir="path_to_save_model" \
--mixed_precision="bf16" \
--resolution=512 \
--learning_rate=1e-5 \
--max_train_steps=15000 \
--checkpointing_steps=200 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--num_double_layers=4 \
--num_single_layers=0 \
--seed=42To better track our training experiments, you can use the following flags in the command above:
validation_image,validation_prompt, andvalidation_stepsto allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
Once training is done, we can perform inference like so:
import mindspore
from mindone.diffusers.utils import load_image
from mindone.diffusers.pipelines.flux.pipeline_flux_controlnet import FluxControlNetPipeline
from mindone.diffusers.models.controlnet_flux import FluxControlNetModel
base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model = "path_to_save_model" # 'promeai/FLUX.1-controlnet-lineart-promeai'
controlnet = FluxControlNetModel.from_pretrained(controlnet_model, mindspore_dtype=mindspore.bfloat16)
pipe = FluxControlNetPipeline.from_pretrained(
base_model,
controlnet=controlnet,
mindspore_dtype=mindspore.bfloat16
)
control_image = load_image("https://huggingface.co/promeai/FLUX.1-controlnet-lineart-promeai/resolve/main/images/example-control.jpg")resize((1024, 1024))
prompt = "cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere"
image = pipe(
prompt,
control_image=control_image,
controlnet_conditioning_scale=0.6,
num_inference_steps=28,
guidance_scale=3.5,
)[0][0]
image.save("./output.png")The training script supports Zero Redundancy Optimizer (ZeRO) from stage 1 to 3. You could enable ZeRO3 training by setting --zero_stage=3 and --distributed.
Here is an example of of training 512 resolution on 4 NPUs with zero3.
export OUTPUT_DIR = 'path_to_output"
msrun --worker_num=4 --local_worker_num=4 --log_dir=$OUTPUT_DIR train_controlnet_flux.py \
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
--dataset_name=fusing/fill50k \
--conditioning_image_column=conditioning_image \
--image_column=image \
--caption_column=text \
--dataloader_num_workers=8 \
--output_dir=$OUTPUT_DIR \
--mixed_precision="bf16" \
--resolution=512 \
--learning_rate=1e-5 \
--max_train_steps=15000 \
--checkpointing_steps=200 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--num_double_layers=4 \
--num_single_layers=0 \
--seed=42 \
--zero_stage=3 \
--distributedRefer to the tutorial of using Zero redundancy optimizer(ZeRO) on MindONE if needed.