|
| 1 | +# Fine-Tuning Diffusion Models with Olive |
| 2 | + |
| 3 | +*Author: Xiaoyu Zhang* |
| 4 | +*Created: 2026-01-26* |
| 5 | + |
| 6 | +This guide shows you how to fine-tune Stable Diffusion and Flux models with LoRA adapters using Olive. You can use either: |
| 7 | + |
| 8 | +- **CLI**: Quick start with `olive diffusion-lora` command |
| 9 | +- **JSON Configuration**: Full control over data preprocessing and training options |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +Olive provides a simple CLI command to train LoRA (Low-Rank Adaptation) adapters for diffusion models. This allows you to: |
| 14 | + |
| 15 | +- Teach your model new artistic styles |
| 16 | +- Train it to generate specific subjects (DreamBooth) |
| 17 | +- Customize image generation without modifying the full model weights |
| 18 | + |
| 19 | +### Supported Models |
| 20 | + |
| 21 | +| Model Type | Example Models | Default Resolution | |
| 22 | +|------------|----------------|-------------------| |
| 23 | +| SD 1.5 | `runwayml/stable-diffusion-v1-5` | 512x512 | |
| 24 | +| SDXL | `stabilityai/stable-diffusion-xl-base-1.0` | 1024x1024 | |
| 25 | +| Flux | `black-forest-labs/FLUX.1-dev` | 1024x1024 | |
| 26 | + |
| 27 | +## Quick Start |
| 28 | + |
| 29 | +### Basic LoRA Training |
| 30 | + |
| 31 | +Train a LoRA adapter on your own images: |
| 32 | + |
| 33 | +```bash |
| 34 | +# Using a local image folder |
| 35 | +olive diffusion-lora \ |
| 36 | + -m runwayml/stable-diffusion-v1-5 \ |
| 37 | + -d /path/to/your/images \ |
| 38 | + -o my-style-lora |
| 39 | + |
| 40 | +# Using a HuggingFace dataset |
| 41 | +olive diffusion-lora \ |
| 42 | + -m runwayml/stable-diffusion-v1-5 \ |
| 43 | + --data_name linoyts/Tuxemon \ |
| 44 | + --caption_column prompt \ |
| 45 | + -o tuxemon-lora |
| 46 | +``` |
| 47 | + |
| 48 | +### DreamBooth Training |
| 49 | + |
| 50 | +Train the model to generate a specific subject (person, pet, object): |
| 51 | + |
| 52 | +```bash |
| 53 | +olive diffusion-lora \ |
| 54 | + -m stabilityai/stable-diffusion-xl-base-1.0 \ |
| 55 | + --model_variant sdxl \ |
| 56 | + -d /path/to/subject/images \ |
| 57 | + --dreambooth \ |
| 58 | + --instance_prompt "a photo of sks dog" \ |
| 59 | + --with_prior_preservation \ |
| 60 | + --class_prompt "a photo of a dog" \ |
| 61 | + -o my-dog-lora |
| 62 | +``` |
| 63 | + |
| 64 | +## Data Sources |
| 65 | + |
| 66 | +Olive supports two ways to provide training data: |
| 67 | + |
| 68 | +### 1. Local Image Folder |
| 69 | + |
| 70 | +Organize your images in a folder with optional caption files: |
| 71 | + |
| 72 | +``` |
| 73 | +my_training_data/ |
| 74 | +├── image1.jpg |
| 75 | +├── image1.txt # Caption: "a beautiful sunset over mountains" |
| 76 | +├── image2.png |
| 77 | +├── image2.txt # Caption: "a cat sitting on a couch" |
| 78 | +└── subfolder/ |
| 79 | + ├── image3.jpg |
| 80 | + └── image3.txt |
| 81 | +``` |
| 82 | + |
| 83 | +Each `.txt` file contains the caption/prompt for the corresponding image. |
| 84 | + |
| 85 | +**No captions?** No problem! Use the `auto_caption` preprocessing step to automatically generate captions using BLIP-2 or Florence-2 models. See the [Data Preprocessing](#data-preprocessing) section for details. |
| 86 | + |
| 87 | +### 2. HuggingFace Dataset |
| 88 | + |
| 89 | +Use any image dataset from the HuggingFace Hub. Specify `--data_name` with optional `--image_column` and `--caption_column` parameters. |
| 90 | + |
| 91 | +## Command Reference |
| 92 | + |
| 93 | +For the complete list of CLI options, see the [Diffusion LoRA CLI Reference](../../reference/cli.rst#diffusion-lora). |
| 94 | + |
| 95 | +```bash |
| 96 | +olive diffusion-lora --help |
| 97 | +``` |
| 98 | + |
| 99 | +## Using the Trained LoRA |
| 100 | + |
| 101 | +After training, load your LoRA adapter with diffusers: |
| 102 | + |
| 103 | +```python |
| 104 | +from diffusers import DiffusionPipeline |
| 105 | +import torch |
| 106 | + |
| 107 | +# Load base model (works for SD, SDXL, Flux) |
| 108 | +pipe = DiffusionPipeline.from_pretrained( |
| 109 | + "runwayml/stable-diffusion-v1-5", |
| 110 | + torch_dtype=torch.float16 |
| 111 | +).to("cuda") |
| 112 | + |
| 113 | +# Load LoRA adapter |
| 114 | +pipe.load_lora_weights("./my-lora-output/adapter") |
| 115 | + |
| 116 | +# Generate images |
| 117 | +image = pipe("a beautiful landscape").images[0] |
| 118 | +image.save("output.png") |
| 119 | +``` |
| 120 | + |
| 121 | +## Tips and Best Practices |
| 122 | + |
| 123 | +### Dataset Preparation |
| 124 | + |
| 125 | +1. **Image Quality**: Use high-quality, consistent images. Aim for 10-50 images for style transfer, 5-20 for DreamBooth. |
| 126 | + |
| 127 | +2. **Captions**: Write descriptive captions that include the key elements you want the model to learn. For DreamBooth, use a unique trigger word (e.g., "sks") that doesn't conflict with existing concepts. |
| 128 | + |
| 129 | +3. **Resolution**: Images don't need to match the training resolution exactly. Olive automatically handles aspect ratio bucketing and resizing, but remember to set `--model_variant sdxl/flux` or `--base_resolution 1024` when training SDXL/Flux so preprocessing runs at the correct size. |
| 130 | + |
| 131 | +### Training Parameters |
| 132 | + |
| 133 | +1. **LoRA Rank (`-r`)**: |
| 134 | + - SD 1.5/SDXL: 4-16 is usually sufficient |
| 135 | + - Flux: Use 16-64 for better quality |
| 136 | + |
| 137 | +2. **Training Steps**: |
| 138 | + - Style transfer: 1000-3000 steps |
| 139 | + - DreamBooth: 500-1500 steps |
| 140 | + |
| 141 | +3. **Learning Rate**: |
| 142 | + - Start with `1e-4` and adjust based on results |
| 143 | + - Lower (e.g., `5e-5`) if overfitting, higher (e.g., `2e-4`) if underfitting |
| 144 | + |
| 145 | +4. **Prior Preservation**: Always use `--with_prior_preservation` for DreamBooth to prevent the model from forgetting general concepts. |
| 146 | + |
| 147 | +### Hardware Requirements (guidelines) |
| 148 | + |
| 149 | +| Model | Minimum VRAM | Recommended VRAM | |
| 150 | +|-------|--------------|------------------| |
| 151 | +| SD 1.5 | 8 GB | 12+ GB | |
| 152 | +| SDXL | 16 GB | 24+ GB | |
| 153 | +| Flux | 24 GB | 40+ GB | |
| 154 | + |
| 155 | + |
| 156 | +## Advanced: Custom Configuration |
| 157 | + |
| 158 | +For more control, you can use Olive's configuration file instead of CLI options: |
| 159 | + |
| 160 | +```json |
| 161 | +{ |
| 162 | + "input_model": { |
| 163 | + "type": "DiffusersModel", |
| 164 | + "model_path": "stabilityai/stable-diffusion-xl-base-1.0" |
| 165 | + }, |
| 166 | + "data_configs": [{ |
| 167 | + "name": "train_data", |
| 168 | + "type": "ImageDataContainer", |
| 169 | + "load_dataset_config": { |
| 170 | + "type": "huggingface_dataset", |
| 171 | + "params": { |
| 172 | + "data_name": "linoyts/Tuxemon", |
| 173 | + "split": "train", |
| 174 | + "image_column": "image", |
| 175 | + "caption_column": "prompt" |
| 176 | + } |
| 177 | + }, |
| 178 | + "pre_process_data_config": { |
| 179 | + "type": "image_lora_preprocess", |
| 180 | + "params": { |
| 181 | + "base_resolution": 1024, |
| 182 | + "steps": { |
| 183 | + "auto_caption": {"model_type": "florence2"}, |
| 184 | + "aspect_ratio_bucketing": {} |
| 185 | + } |
| 186 | + } |
| 187 | + } |
| 188 | + }], |
| 189 | + "passes": { |
| 190 | + "sd_lora": { |
| 191 | + "type": "SDLoRA", |
| 192 | + "train_data_config": "train_data", |
| 193 | + "r": 16, |
| 194 | + "alpha": 16, |
| 195 | + "training_args": { |
| 196 | + "max_train_steps": 2000, |
| 197 | + "learning_rate": 1e-4, |
| 198 | + "train_batch_size": 1, |
| 199 | + "gradient_accumulation_steps": 4, |
| 200 | + "mixed_precision": "bf16" |
| 201 | + } |
| 202 | + } |
| 203 | + }, |
| 204 | + "systems": { |
| 205 | + "local_system": { |
| 206 | + "type": "LocalSystem", |
| 207 | + "accelerators": [{"device": "gpu"}] |
| 208 | + } |
| 209 | + }, |
| 210 | + "host": "local_system", |
| 211 | + "target": "local_system", |
| 212 | + "output_dir": "my-lora-output" |
| 213 | +} |
| 214 | +``` |
| 215 | + |
| 216 | +Run with: |
| 217 | + |
| 218 | +```bash |
| 219 | +olive run --config my_lora_config.json |
| 220 | +``` |
| 221 | + |
| 222 | +## Data Preprocessing |
| 223 | + |
| 224 | +Olive supports automatic data preprocessing including image filtering, auto-captioning, tagging, and aspect ratio bucketing. |
| 225 | + |
| 226 | +**CLI** only supports basic aspect ratio bucketing via `--base_resolution`. For advanced preprocessing (auto-captioning, filtering, tagging), use a JSON configuration file. |
| 227 | + |
| 228 | +For detailed preprocessing options and examples, see the [SD LoRA Feature Documentation](../../features/sd-lora.md#data-configuration). |
| 229 | + |
| 230 | +## Export to ONNX and Run Inference |
| 231 | + |
| 232 | +After fine-tuning, you can merge the LoRA adapter into the base model and export the pipeline to ONNX with Olive's CLI, then run inference using ONNX Runtime. |
| 233 | + |
| 234 | +### 1. Export with the CLI |
| 235 | + |
| 236 | +Use `capture-onnx-graph` to export the base components together with your LoRA adapter: |
| 237 | + |
| 238 | +```bash |
| 239 | +olive capture-onnx-graph \ |
| 240 | + -m stabilityai/stable-diffusion-xl-base-1.0 \ |
| 241 | + -a my-lora-output/adapter \ |
| 242 | + --output_path sdxl-lora-onnx |
| 243 | +``` |
| 244 | + |
| 245 | +### Multi LoRA + inference |
| 246 | + |
| 247 | +Want to combine multiple adapters or see a full inference notebook? Check [sd_multilora.ipynb](https://github.com/microsoft/Olive/blob/main/notebooks/sd_multilora/sd_multilora.ipynb) for an end-to-end example covering multi-LoRA composition and ONNX Runtime inference. |
| 248 | + |
| 249 | +## Related Resources |
| 250 | + |
| 251 | +- [DreamBooth Paper](https://arxiv.org/abs/2208.12242) |
| 252 | +- [LoRA Paper](https://arxiv.org/abs/2106.09685) |
0 commit comments