Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Copyright (c) 2021, 2023 Oracle and/or its affiliates.

The Universal Permissive License (UPL), Version 1.0

Subject to the condition set forth below, permission is hereby granted to any
person obtaining a copy of this software, associated documentation and/or data
(collectively the "Software"), free of charge and under any and all copyright
rights in the Software, and any and all patent rights owned or freely
licensable by each licensor hereunder covering either (i) the unmodified
Software as contributed to or provided by such licensor, or (ii) the Larger
Works (as defined below), to deal in both

(a) the Software, and
(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
one is included with the Software (each a "Larger Work" to which the Software
is contributed by such licensors),

without restriction, including without limitation the rights to copy, create
derivative works of, display, perform, and distribute the Software and make,
use, sell, offer for sale, import, export, have made, and have sold the
Software and the Larger Work(s), and to sublicense the foregoing rights on
either these or other terms.

This license is subject to the following condition:
The above copyright notice and either this complete permission notice or at
a minimum a reference to the UPL must be included in all copies or
substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# flux-1-finetuning

This repo is designed to quickly prepare a demo that showcases how to use OCI GPU shapes to finetune lora models of flux.1-dev.

Flux is a set of diffusion models created by BlackForest Labs, and are subject to [licensing terms](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev)
In this blog we will show how to fine tune flux.1-dev that is available in [HuggingFace](https://huggingface.co/black-forest-labs/FLUX.1-dev)

There are several projects that can be used to work with flux models
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI) is a powerful and user-friendly tool for creating high-quality images using AI, including the FLUX model. It offers a modular workflow design that allows users to create custom image generation processes by connecting different components
- [AItoolkit](https://github.com/comfyanonymous/ComfyUI) is a tool that simplifies Flux fine tuning experience expoecially to reduce VRAM requirements.
- [SimpleTuner](https://github.com/bghira/SimpleTuner) is a set of scripts that simplify distributed fine tuning on multiple GPUs

Prerequisites:
- Linux based GPU VM with recent Nvidia driver and Cuda toolkit
- git, Miniconda installed
- An account in HuggingFace where you can login with huggingface-cli login

## Installing AI toolkit ##

use aitoolkit.yaml to prepare a conda environment with the required packages

```
conda create env -f aitoolkit.yaml
conda activate aitoolkit
```

then you can clone the ai-toolkit
```
git clone https://github.com/ostris/ai-toolkit.git
cd ai-toolkit
git submodule update --init --recursive
```

## Dataset generation ##

You can take 20-30 pictures of yourself.Use high-resolution images, ideally at least 1024x1024 pixels. If your images are larger, crop them to a 1:1 aspect ratio (square), centering your face or the main subject in each image. Ensure all images are sharp, in focus, and free of blur or artifacts. Avoid including any low-quality or poorly lit images. Each image should feature only you as the main subject, clearly visible and centered. Avoid group photos or images with distracting backgrounds. Maximize diversity by taking photos in different environments, with varied backgrounds, lighting conditions, facial expressions, and outfits. This helps the model generalize and prevents overfitting to a single look or scenario


## Training

Aitoolkit has a large set of options that can be exploited to train a lora model for flux1. You can find examples in the directory config/examples/.
According to the different GPUs you can use them to either reduce video memory consumption, or to improve training performance.


- folder_path: "/path/to/images/folder" , speicfy where the dataset is
- gradient_checkpointing: true This feature allows to reduce memory footprint, but this increases computation time by about 35%. On large memory GPUs it is convenient to set it to false.
- model:quantize: true This uses intermediate 8bit quantization to reduce memory footprint, the final model will still be 16bits, so turn it on only on small GPUs.
- model:low_vram: true This further reduces memory footprint on very small GPUs.
- prompts: this is a list of prompts that are used the create intermidiate images to chck quality, for analyzing performances you can remove them
- batch_size: 1 increasing batch size on a single GPU deteriorates performance, recommended to stick with 1.
- trigger_word: a GPU Specialist Here you set the keyword that you can use in the prompt.

## Installing ComfyUI

ComfyUI can be used the test the generated lora model. It can be installed in the same conda env as Attoolkit

```
git clone https://github.com/comfyanonymous/ComfyUI/tree/v0.3.10
cd ComfyUI
python main.py
```

You can connect to the ComfyUI GUI by pointing your browser to port 8188, according to the network configuration a port forward might be required.

Then you need to import models that are required by the workflow.

Download the [Clip Safetensor](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/clip_l.safetensors) to ComfyUI/models/clip/
This model plays a crucial role in text-to-image generation tasks by processing and encoding textual input.

Download the Text Encoder [T5xxl Safetensor](https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp8_e4m3fn.safetensors) to ComfyUI/models/clip/

Download the [VAE Safetensor](https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/ae.safetensors) to ComfyUI/models/vae

Download the [Flux.1-dev UNET model](https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main) to ComfyUI/models/unet

## Testing Lora models with ComfyUI

Everytime you create a lora model with Ai-toolkit you can copy it to ComfyUI/models/lora

Import the workflow by opening the file workflow-lora.json

You will then be able to select the model in the Load Lora box. Make sure also the proper models are selected in the Load diffusion Model, DualCLIPLoader, and Load VAE boxes.

You can write your own prompt in the CLIP Text Encode box, remeber to refer to the keyword used for training the Lora.
![Alt text](files/ComfyUI.png?raw=true "ComfyUI Lora workflow")

## Installing SimpleTuner

```
git clone --branch=release https://github.com/bghira/SimpleTuner.git
```
Copy config/config.json.example to config/config.json

Then you execute the training with

```
./train.sh
```

Parallel training is possible using Accelerate (the Deepspeed implementation on Flux is buggy at the time of writing.
When more GPUs are used, the batch size is increased automatically, so the number os steps required to process one full epoch is riduced proportionally.


If present the Accelerate configuration will be taken from the config file in

~/.cache/huggingface/accelerate/default_config.yaml

```
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: *MULTI_GPU*
downcast_bf16: 'no'
enable_cpu_affinity: true
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```

If this file is not present you can create a file config/config.env and use it to set this environmental variable:

```
TRAINING_NUM_PROCESSES=4
```






Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
name: aitoolkit
channels:
- defaults
- https://repo.anaconda.com/pkgs/main
- https://repo.anaconda.com/pkgs/r
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h5eee18b_6
- ca-certificates=2024.11.26=h06a4308_0
- ld_impl_linux-64=2.40=h12ee557_0
- libffi=3.4.4=h6a678d5_1
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.15=h5eee18b_0
- pip=24.2=py311h06a4308_0
- python=3.11.10=he870216_0
- readline=8.2=h5eee18b_0
- setuptools=75.1.0=py311h06a4308_0
- sqlite=3.45.3=h5eee18b_0
- tk=8.6.14=h39e8969_0
- wheel=0.44.0=py311h06a4308_0
- xz=5.4.6=h5eee18b_1
- zlib=1.2.13=h5eee18b_1
- pip:
- absl-py==2.1.0
- accelerate==1.2.1
- aiofiles==23.2.1
- albucore==0.0.16
- albumentations==1.4.15
- annotated-types==0.7.0
- antlr4-python3-runtime==4.9.3
- anyio==4.7.0
- attrs==24.3.0
- bitsandbytes==0.45.0
- certifi==2024.12.14
- charset-normalizer==3.4.0
- clean-fid==0.1.35
- click==8.1.7
- clip-anytorch==2.6.0
- controlnet-aux==0.0.7
- dctorch==0.1.2
- diffusers==0.32.0.dev0
- docker-pycreds==0.4.0
- einops==0.8.0
- eval-type-backport==0.2.0
- fastapi==0.115.6
- ffmpy==0.5.0
- filelock==3.16.1
- flatten-json==0.1.14
- fsspec==2024.12.0
- ftfy==6.3.1
- gitdb==4.0.11
- gitpython==3.1.43
- gradio==5.9.1
- gradio-client==1.5.2
- grpcio==1.68.1
- h11==0.14.0
- hf-transfer==0.1.8
- httpcore==1.0.7
- httpx==0.28.1
- huggingface-hub==0.27.0
- idna==3.10
- imageio==2.36.1
- importlib-metadata==8.5.0
- invisible-watermark==0.2.0
- jinja2==3.1.4
- jsonmerge==1.9.2
- jsonschema==4.23.0
- jsonschema-specifications==2024.10.1
- k-diffusion==0.1.1.post1
- kornia==0.7.4
- kornia-rs==0.1.7
- lazy-loader==0.4
- lpips==0.1.4
- lycoris-lora==1.8.3
- markdown==3.7
- markdown-it-py==3.0.0
- markupsafe==2.1.5
- mdurl==0.1.2
- mpmath==1.3.0
- networkx==3.4.2
- ninja==1.11.1.3
- numpy==1.26.4
- nvidia-cublas-cu12==12.4.5.8
- nvidia-cuda-cupti-cu12==12.4.127
- nvidia-cuda-nvrtc-cu12==12.4.127
- nvidia-cuda-runtime-cu12==12.4.127
- nvidia-cudnn-cu12==9.1.0.70
- nvidia-cufft-cu12==11.2.1.3
- nvidia-curand-cu12==10.3.5.147
- nvidia-cusolver-cu12==11.6.1.9
- nvidia-cusparse-cu12==12.3.1.170
- nvidia-nccl-cu12==2.21.5
- nvidia-nvjitlink-cu12==12.4.127
- nvidia-nvtx-cu12==12.4.127
- omegaconf==2.3.0
- open-clip-torch==2.29.0
- opencv-python==4.10.0.84
- opencv-python-headless==4.10.0.84
- optimum-quanto==0.2.4
- orjson==3.10.12
- oyaml==1.0
- packaging==24.2
- pandas==2.2.3
- peft==0.14.0
- pillow==11.0.0
- platformdirs==4.3.6
- prodigyopt==1.1.1
- protobuf==5.29.2
- psutil==6.1.1
- pydantic==2.10.4
- pydantic-core==2.27.2
- pydub==0.25.1
- pygments==2.18.0
- python-dateutil==2.9.0.post0
- python-dotenv==1.0.1
- python-multipart==0.0.20
- python-slugify==8.0.4
- pytorch-fid==0.3.0
- pytz==2024.2
- pywavelets==1.8.0
- pyyaml==6.0.2
- referencing==0.35.1
- regex==2024.11.6
- requests==2.32.3
- rich==13.9.4
- rpds-py==0.22.3
- ruff==0.8.4
- safehttpx==0.1.6
- safetensors==0.4.5
- scikit-image==0.25.0
- scipy==1.14.1
- semantic-version==2.10.0
- sentencepiece==0.2.0
- sentry-sdk==2.19.2
- setproctitle==1.3.4
- shellingham==1.5.4
- six==1.17.0
- smmap==5.0.1
- sniffio==1.3.1
- starlette==0.41.3
- sympy==1.13.1
- tensorboard==2.18.0
- tensorboard-data-server==0.7.2
- text-unidecode==1.3
- tifffile==2024.12.12
- timm==1.0.12
- tokenizers==0.21.0
- toml==0.10.2
- tomlkit==0.13.2
- torch==2.5.1
- torchdiffeq==0.2.5
- torchsde==0.2.6
- torchvision==0.20.1
- tqdm==4.67.1
- trampoline==0.1.2
- transformers==4.47.1
- triton==3.1.0
- typer==0.15.1
- typing-extensions==4.12.2
- tzdata==2024.2
- urllib3==2.2.3
- uvicorn==0.34.0
- wandb==0.19.1
- wcwidth==0.2.13
- websockets==14.1
- werkzeug==3.1.3
- zipp==3.21.0
prefix: /home/ubuntu/anaconda3/envs/aitoolkit2
Loading