|
| 1 | +# flux-1-finetuning |
| 2 | + |
| 3 | +This repo is designed to quickly prepare a demo that showcases how to use OCI GPU shapes to finetune lora models of flux.1-dev. |
| 4 | + |
| 5 | +Flux is a set of diffusion models created by BlackForest Labs, and are subject to [licensing terms](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev) |
| 6 | +In this blog we will show how to fine tune flux.1-dev that is available in [HuggingFace](https://huggingface.co/black-forest-labs/FLUX.1-dev) |
| 7 | + |
| 8 | +There are several projects that can be used to work with flux models |
| 9 | +- [ComfyUI](https://github.com/comfyanonymous/ComfyUI) is a powerful and user-friendly tool for creating high-quality images using AI, including the FLUX model. It offers a modular workflow design that allows users to create custom image generation processes by connecting different components |
| 10 | +- [AItoolkit](https://github.com/comfyanonymous/ComfyUI) is a tool that simplifies Flux fine tuning experience expoecially to reduce VRAM requirements. |
| 11 | +- [SimpleTuner](https://github.com/bghira/SimpleTuner) is a set of scripts that simplify distributed fine tuning on multiple GPUs |
| 12 | + |
| 13 | +Prerequisites: |
| 14 | +- Linux based GPU VM with recent Nvidia driver and Cuda toolkit |
| 15 | +- git, Miniconda installed |
| 16 | +- An account in HuggingFace where you can login with huggingface-cli login |
| 17 | + |
| 18 | +## Installing AI toolkit ## |
| 19 | + |
| 20 | +use aitoolkit.yaml to prepare a conda environment with the required packages |
| 21 | + |
| 22 | +``` |
| 23 | +conda create env -f aitoolkit.yaml |
| 24 | +conda activate aitoolkit |
| 25 | +``` |
| 26 | + |
| 27 | +then you can clone the ai-toolkit |
| 28 | +``` |
| 29 | +git clone https://github.com/ostris/ai-toolkit.git |
| 30 | +cd ai-toolkit |
| 31 | +git submodule update --init --recursive |
| 32 | +``` |
| 33 | + |
| 34 | +## Dataset generation ## |
| 35 | + |
| 36 | +WIP |
| 37 | +you can take 10 pictures of yourself . |
| 38 | + |
| 39 | + |
| 40 | +## Training |
| 41 | + |
| 42 | +Aitoolkit has a large set of options that can be exploited to train a lora model for flux1. You can fin examples in the directory config/examples/. |
| 43 | +According to the different GPUs you can use them to eitther reduce video memory consumption, or to improve training performance. |
| 44 | + |
| 45 | + |
| 46 | +- folder_path: "/path/to/images/folder" , speicfy where the dataset is |
| 47 | +- gradient_checkpointing: true This feature allows to reduce memory footprint, but this increases computation time by about 35%. On large memory GPUs it is convenient to set it to false. |
| 48 | +- model:quantize: true This uses intermediate 8bit quantization to reduce memory footprint, the final model will still be 16bits, so turn it on only on small GPUs. |
| 49 | +- model:low_vram: true This further reduces memory footprint on very small GPUs. |
| 50 | +- prompts: this is a list of prompts that are used the create intermidiate images to chck quality, for analyzing performances you can remove them |
| 51 | +- batch_size: 1 increasing batch size on a single GPU deteriorates performance, recommended to stick with 1. |
| 52 | +- trigger_word: a GPU Specialist Here you set the keyword that you can use in the prompt. |
| 53 | + |
| 54 | +## Installing ComfyUI |
| 55 | + |
| 56 | +ComfyUI can be used the test the generated lora model. It can be installed in the same conda env as Attoolkit |
| 57 | + |
| 58 | +``` |
| 59 | +git clone https://github.com/comfyanonymous/ComfyUI/tree/v0.3.10 |
| 60 | +cd ComfyUI |
| 61 | +python main.py |
| 62 | +``` |
| 63 | + |
| 64 | +You can connect to the ComfyUI GUI by pointing your browser to port 8188, according to the network configuration a port forward might be required. |
| 65 | + |
| 66 | +Then you need to import models that are required by the workflow. |
| 67 | + |
| 68 | +Download the [Clip Safetensor](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/clip_l.safetensors) to ComfyUI/models/clip/ |
| 69 | +This model plays a crucial role in text-to-image generation tasks by processing and encoding textual input. |
| 70 | + |
| 71 | +Download the Text Encoder [T5xxl Safetensor](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp8_e4m3fn.safetensors) to ComfyUI/models/clip/ |
| 72 | + |
| 73 | +Download the [VAE Safetensor](https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/ae.safetensors) to ComfyUI/models/vae |
| 74 | + |
| 75 | +Download the [Flux.1-dev UNET model](https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main) to ComfyUI/models/unet |
| 76 | + |
| 77 | +## Testing Lora models with ComfyUI |
| 78 | + |
| 79 | +Everytime you create a lora model with Ai-toolkit you can copy it to ComfyUI/models/lora |
| 80 | + |
| 81 | +Import the workflow by opening the file workflow-lora.json |
| 82 | + |
| 83 | +You will then be able to select the model in the Load Lora box. Make sure also the proper models are selected in the Load diffusion Model, DualCLIPLoader, and Load VAE boxes. |
| 84 | + |
| 85 | +You can write your own prompt in the CLIP Text Encode box, remeber to refer to the keyword used for training the Lora. |
| 86 | + |
| 87 | + |
| 88 | +## Installing SimpleTuner |
| 89 | + |
| 90 | +``` |
| 91 | +git clone --branch=release https://github.com/bghira/SimpleTuner.git |
| 92 | +``` |
| 93 | +Copy config/config.json.example to config/config.json |
| 94 | + |
| 95 | +Then you execute the training with |
| 96 | + |
| 97 | +``` |
| 98 | +./train.sh |
| 99 | +``` |
| 100 | + |
| 101 | +Parallel training is possible using Accelerate (the Deepspeed implementation on Flux is buggy at the time of writing. |
| 102 | +When more GPUs are used, the batch size is increased automatically, so the number os steps required to process one full epoch is riduced proportionally. |
| 103 | + |
| 104 | + |
| 105 | +If present the Accelerate configuration will be taken from the config file in |
| 106 | + |
| 107 | +~/.cache/huggingface/accelerate/default_config.yaml |
| 108 | + |
| 109 | +``` |
| 110 | +compute_environment: LOCAL_MACHINE |
| 111 | +debug: false |
| 112 | +distributed_type: *MULTI_GPU* |
| 113 | +downcast_bf16: 'no' |
| 114 | +enable_cpu_affinity: true |
| 115 | +gpu_ids: all |
| 116 | +machine_rank: 0 |
| 117 | +main_training_function: main |
| 118 | +mixed_precision: bf16 |
| 119 | +num_machines: 1 |
| 120 | +num_processes: 4 |
| 121 | +rdzv_backend: static |
| 122 | +same_network: true |
| 123 | +tpu_env: [] |
| 124 | +tpu_use_cluster: false |
| 125 | +tpu_use_sudo: false |
| 126 | +use_cpu: false |
| 127 | +``` |
| 128 | + |
| 129 | +If this file is not present you can create a file config/config.env and use it to set this environmental variable: |
| 130 | + |
| 131 | +``` |
| 132 | +TRAINING_NUM_PROCESSES=4 |
| 133 | +``` |
| 134 | + |
| 135 | + |
| 136 | + |
| 137 | + |
| 138 | + |
| 139 | + |
0 commit comments