Image LoRA Trainer

A CLI tool to train LoRA adapters for text-to-image models using a folder of images with intelligent content-aware cropping.

How It Works

flowchart TD
    Start(["📁 Input Images"]) --> Load["Load Images"]
    Load --> Check{"Crop Focus - Specified?"}

    Check -->|Yes| YOLO["🔍 YOLO11 Detection"]
    Check -->|No| Center["Center Crop"]

    YOLO --> Found{"Target - Found?"}
    Found -->|Yes| Crop["Smart Crop + Padding"]
    Found -->|No| Skip["⏭️ Skip Image"]

    Center --> Resize["Resize to target resolution"]
    Crop --> Resize

    Resize --> Train["🎯 LoRA Training (Diffusers + PEFT)"]
    Skip --> Log

    Train --> Save["💾 Save LoRA weights"]
    Save --> Log["📊 Generate training_log.json"]

    Log --> End(["✅ Trained LoRA + Logs"])

    style Start fill:#E8F4F8,stroke:#2C5F7C,stroke-width:3px,color:#1a1a1a
    style Load fill:#FFF4E6,stroke:#8B6914,stroke-width:2px,color:#1a1a1a
    style Check fill:#F0E6FF,stroke:#6B46C1,stroke-width:2px,color:#1a1a1a
    style YOLO fill:#E6F7FF,stroke:#1E5A8E,stroke-width:2px,color:#1a1a1a
    style Center fill:#FFF0F5,stroke:#8B4789,stroke-width:2px,color:#1a1a1a
    style Found fill:#F0E6FF,stroke:#6B46C1,stroke-width:2px,color:#1a1a1a
    style Crop fill:#E6FFE6,stroke:#2D5F2D,stroke-width:2px,color:#1a1a1a
    style Skip fill:#FFE6E6,stroke:#8B2E2E,stroke-width:2px,color:#1a1a1a
    style Resize fill:#FFF4E6,stroke:#8B6914,stroke-width:2px,color:#1a1a1a
    style Train fill:#E6F7FF,stroke:#1E5A8E,stroke-width:2px,color:#1a1a1a
    style Save fill:#E6FFE6,stroke:#2D5F2D,stroke-width:2px,color:#1a1a1a
    style Log fill:#FFF4E6,stroke:#8B6914,stroke-width:2px,color:#1a1a1a
    style End fill:#E8F4F8,stroke:#2C5F7C,stroke-width:3px,color:#1a1a1a

Features

Content-Aware Cropping: Uses YOLO11 segmentation to automatically detect and crop to specific objects from the COCO dataset (faces, people, animals, etc.)
Smart Filtering: Automatically skips images that don't contain the target feature
Training Logs: Generates detailed JSON logs of processed, skipped, and failed images
LoRA/QLoRA Training: Full training pipeline using diffusers and peft with optional quantization
Multiple Model Support: Works with Stable Diffusion 1.5, SDXL, and Z-Image-Turbo (fast 8-step DiT model)
Visual Verification: Includes a generation script to test your trained LoRA

Installation

Requires Python 3.13+ and uv for dependency management.

# Clone the repository
git clone git@github.com:paazmaya/image-lora-trainer.git
cd image-lora-trainer

# Install dependencies
uv sync

GPU Setup (Required)

This tool requires a CUDA-capable GPU. Training on CPU is impractically slow for diffusion models.

Verify you have a CUDA-capable NVIDIA GPU
Install CUDA 13 drivers from NVIDIA's website

Install PyTorch with CUDA support:

uv pip uninstall torch torchvision -y
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130

Its about 1.73 GB to download.

Verify GPU is detected:

uv run python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'No GPU')"

You should see something like:

CUDA available: True
GPU: NVIDIA GeForce RTX 4070 Ti

If you see CUDA available: False, the CPU-only version of PyTorch is installed. Follow step 3 above.

Quick Start

1. Prepare Your Images

Place your training images in a folder:

my_images/
├── photo1.jpg
├── photo2.png
└── photo3.jpg

2. Train a LoRA

Basic training:

uv run python src/main.py --input-dir my_images --base-model runwayml/stable-diffusion-v1-5

With content-aware cropping (only trains on images with faces):

uv run python src/main.py \
  --input-dir my_images \
  --base-model runwayml/stable-diffusion-v1-5 \
  --crop-focus person \
  --resolution 512 \
  --steps 1000

With QLoRA (4-bit quantization for lower memory usage):

uv run python src/main.py \
  --input-dir my_images \
  --base-model stabilityai/stable-diffusion-xl-base-1.0 \
  --use-qlora \
  --resolution 1024

Training Z-Image-Turbo LoRA

Z-Image-Turbo is a fast 6B parameter diffusion transformer that produces high-quality images in just 8 steps.

Basic Z-Image training:

uv run python src/main.py train-zimage \
  --input-dir my_images \
  --instance-prompt "a photo of sks person"

With 8-bit quantization (for lower VRAM usage, ~12GB instead of ~24GB):

uv run python src/main.py train-zimage \
  --input-dir my_images \
  --instance-prompt "a photo of sks person" \
  --use-8bit \
  --steps 500

With all options:

uv run python src/main.py train-zimage \
  --input-dir my_images \
  --instance-prompt "a photo of sks karate practitioner" \
  --crop-focus person \
  --use-8bit \
  --lr 1e-5 \
  --lora-rank 16 \
  --steps 1000

With locally available model:

Learning rate: 1e-5 (lower than SD/SDXL) LoRA rank: 16 (can go higher for more capacity) Steps: 500-1500 for characters/styles

Note: Z-Image-Turbo uses a training adapter by default to prevent the distillation from breaking during training. This is recommended for short training runs (styles, concepts, characters).

3. Check Training Results

After training, check the training_log.json in your output directory:

{
  "base_folder": "/absolute/path/to/my_images",
  "trained": ["image1.png", "image2.png"],
  "skipped": ["image3.png"],
  "failed": []
}

4. Generate Images with Your LoRA

For Stable Diffusion:

uv run python src/generate.py sd \
  --base-model runwayml/stable-diffusion-v1-5 \
  --lora-path stable-diffusion-v1-5_my_images \
  --prompt "a photo of a sks person" \
  --output result.png

For Z-Image-Turbo:

uv run python src/generate.py zimage \
  --lora-path zimage-turbo_my_images \
  --prompt "a photo of sks person, professional studio lighting" \
  --output result.png

Important: Use the same trigger word ("sks" in this example) that you specified in --instance-prompt during training.

More generation examples:

# Portrait with different styling
uv run python src/generate.py sd --lora-path <path> --prompt "portrait of sks person, oil painting"

# Different context
uv run python src/generate.py sd --lora-path <path> --prompt "sks person in a futuristic city"

CLI Options

Training SD/SDXL (`src/main.py train`)

Option	Description	Default
`--input-dir`	Path to training images	Required
`--output-dir`	Output directory for LoRA	Current directory
`--base-model`	Hugging Face model ID or local path	`runwayml/stable-diffusion-v1-5`
`--resolution`	Training image resolution	512
`--crop-focus`	Object to focus on (e.g., "person", "face", "dog")	None (center crop)
`--use-qlora`	Enable 4-bit quantization	False
`--instance-prompt`	Training prompt with trigger word	"a photo of a sks person"
`--steps`	Number of training steps	1000
`--epochs`	Number of epochs (overrides steps)	None

Training Z-Image-Turbo (`src/main.py train-zimage`)

Option	Description	Default
`--input-dir`	Path to training images	Required
`--output-dir`	Output directory for LoRA	Current directory
`--base-model`	Z-Image model ID	`Tongyi-MAI/Z-Image-Turbo`
`--resolution`	Training image resolution	1024
`--crop-focus`	Object to focus on	None (center crop)
`--use-8bit`	Enable 8-bit quantization	False
`--no-training-adapter`	Disable de-distillation adapter	False (adapter enabled)
`--instance-prompt`	Training prompt with trigger word	"a photo of a sks person"
`--steps`	Number of training steps	1000
`--lr`	Learning rate	1e-5
`--lora-rank`	LoRA rank	16
`--lora-alpha`	LoRA alpha	16
`--save-steps`	Save checkpoint every N steps	500

About --instance-prompt: The instance prompt contains a trigger word (like "sks") that the model learns to associate with your training images. This trigger word is what you'll use later when generating images with the LoRA.

Use a unique, uncommon token (e.g., "sks", "xyz", "abc123")
Include the class name (e.g., "person", "dog", "style")
Example: "a photo of sks person" → Use "sks person" in generation prompts

Generation (`src/generate.py sd` / `src/generate.py zimage`)

SD Generation Options:

Option	Description	Default
`--base-model`	Base model ID or path	`runwayml/stable-diffusion-v1-5`
`--lora-path`	Path to trained LoRA	Required
`--prompt`	Generation prompt	Required
`--output`	Output filename	`output.png`
`--steps`	Inference steps	30

Z-Image Generation Options:

Option	Description	Default
`--base-model`	Z-Image model ID	`Tongyi-MAI/Z-Image-Turbo`
`--lora-path`	Path to trained LoRA	Required
`--prompt`	Generation prompt	Required
`--output`	Output filename	`output.png`
`--width`	Image width	1024
`--height`	Image height	1024
`--steps`	Inference steps	8
`--seed`	Random seed	None (random)
`--lora-scale`	LoRA weight scale	1.0

Content-Aware Cropping

When you specify --crop-focus, the tool uses YOLO11 to detect objects in your images:

Supported objects: Any object in the COCO dataset (person, dog, cat, car, etc.)
Behavior: Images without the target object are automatically skipped
Fallback: If no focus is specified, images are center-cropped

Example crop focuses:

person - Crops to people
face - Crops to faces (use "person" for full body)
dog, cat - Crops to animals
car, truck - Crops to vehicles

Utility Scripts

Model Precision Conversion (`scripts/convert_floats.py`)

Convert diffusers models or safetensor files to reduced precision formats:

# Convert a model directory to bfloat16 (default)
uv run python scripts/convert_floats.py --input H:/my-model
# Output: H:/my-model-bf16

# Convert to float8 e4m3fn (higher precision, good for inference)
uv run python scripts/convert_floats.py --input H:/my-model --dtype e4m3fn
# Output: H:/my-model-e4m3fn

# Convert to float8 e5m2 (wider range)
uv run python scripts/convert_floats.py --input H:/my-model --dtype e5m2
# Output: H:/my-model-e5m2

# Convert a single safetensors file
uv run python scripts/convert_floats.py --input H:/models/model.safetensors
# Output: H:/models/model-bf16.safetensors

Supported formats:

bf16 - bfloat16 (16-bit, ~50% size reduction from fp32)
e4m3fn - float8 with 4-bit exponent, 3-bit mantissa (higher precision)
e5m2 - float8 with 5-bit exponent, 2-bit mantissa (wider dynamic range)

Development

Running Tests

uv run pytest tests/

Linting and Formatting

uv run ruff check --fix
uv run ruff format

Output Structure

After training, your output directory will contain:

stable-diffusion-v1-5_my_images/
├── adapter_config.json       # LoRA configuration
├── adapter_model.safetensors # LoRA weights
├── training_log.json         # Processing log
├── logs/                     # Training logs
└── processed_images/         # Preprocessed images

License

MIT

Acknowledgments

Built with Ultralytics YOLO11
Uses Hugging Face Diffusers
LoRA implementation via PEFT
Z-Image-Turbo by Tongyi-MAI
Z-Image training adapter by ostris

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
logo.png		logo.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock
yolo11n-seg.pt		yolo11n-seg.pt
yolov8n-seg.pt		yolov8n-seg.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image LoRA Trainer

How It Works

Features

Installation

GPU Setup (Required)

Quick Start

1. Prepare Your Images

2. Train a LoRA

Training Z-Image-Turbo LoRA

3. Check Training Results

4. Generate Images with Your LoRA

CLI Options

Training SD/SDXL (`src/main.py train`)

Training Z-Image-Turbo (`src/main.py train-zimage`)

Generation (`src/generate.py sd` / `src/generate.py zimage`)

Content-Aware Cropping

Utility Scripts

Model Precision Conversion (`scripts/convert_floats.py`)

Development

Running Tests

Linting and Formatting

Output Structure

License

Acknowledgments

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Image LoRA Trainer

How It Works

Features

Installation

GPU Setup (Required)

Quick Start

1. Prepare Your Images

2. Train a LoRA

Training Z-Image-Turbo LoRA

3. Check Training Results

4. Generate Images with Your LoRA

CLI Options

Training SD/SDXL (src/main.py train)

Training Z-Image-Turbo (src/main.py train-zimage)

Generation (src/generate.py sd / src/generate.py zimage)

Content-Aware Cropping

Utility Scripts

Model Precision Conversion (scripts/convert_floats.py)

Development

Running Tests

Linting and Formatting

Output Structure

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages

Training SD/SDXL (`src/main.py train`)

Training Z-Image-Turbo (`src/main.py train-zimage`)

Generation (`src/generate.py sd` / `src/generate.py zimage`)

Model Precision Conversion (`scripts/convert_floats.py`)