For users that wish to make use of Docker or another container orchestration platform, see this document first.
For users operating on Windows 10 or newer, an installation guide based on Docker and WSL is available here this document.
You can simply install SimpleTuner using pip, which is recommended for most users:
# for CUDA
pip install 'simpletuner[cuda]'
# for CUDA 13 / Blackwell (NVIDIA B-series GPUs)
pip install 'simpletuner[cuda13]' --extra-index-url https://download.pytorch.org/whl/cu130
# for ROCm
pip install 'simpletuner[rocm]' --extra-index-url https://download.pytorch.org/whl/rocm7.1
# for Apple Silicon
pip install 'simpletuner[apple]'
# for CPU-only (not recommended)
pip install 'simpletuner[cpu]'
# for JPEG XL support (optional)
pip install 'simpletuner[jxl]'
# development requirements (optional, only for submitting PRs or running tests)
pip install 'simpletuner[dev]'For local development or testing, you can clone the SimpleTuner repository and set up the python venv:
git clone --branch=release https://github.com/bghira/SimpleTuner.git
cd SimpleTuner
# if python --version shows 3.11 or 3.12, you may want to upgrade to 3.13.
python3.13 -m venv .venv
source .venv/bin/activateℹ️ You can use your own custom venv path by setting
export VENV_PATH=/path/to/.venvin yourconfig/config.envfile.
Note: We're currently installing the release branch here; the main branch may contain experimental features that might have better results or lower memory use.
Install SimpleTuner with automatic platform detection:
# Basic installation (auto-detects CUDA/ROCm/Apple)
pip install -e .
# With JPEG XL support
pip install -e .[jxl]Note: The setup.py automatically detects your platform (CUDA/ROCm/Apple) and installs the appropriate dependencies.
Optionally, Hopper (or newer) equipment can make use of FlashAttention3 for improved inference and training performance when making use of torch.compile
You'll need to run the following sequence of commands from your SimpleTuner directory, with your venv active:
git clone https://github.com/Dao-AILab/flash-attention
pushd flash-attention
pushd hopper
python setup.py install
popd
popd
⚠️ Managing the flash_attn build is poorly-supported in SimpleTuner, currently. This can break on updates, requiring you to re-run this build procedure manually from time-to-time.
The following must be executed for an AMD MI300X to be useable:
apt install amd-smi-lib
pushd /opt/rocm/share/amd_smi
python3 -m pip install --upgrade pip
python3 -m pip install .
popdℹ️ ROCm acceleration defaults: When SimpleTuner detects a HIP-enabled PyTorch build it automatically exports
PYTORCH_TUNABLEOP_ENABLED=1(unless you already set it) so TunableOp kernels are available. On MI300/gfx94x devices we also setHIPBLASLT_ALLOW_TF32=1by default, enabling hipBLASLt’s TF32 paths without requiring manual environment tweaks.
- 2a. Option One (Recommended): Run
simpletuner configure - 2b. Option Two: Copy
config/config.json.exampletoconfig/config.jsonand then fill in the details.
⚠️ For users located in countries where Hugging Face Hub is not readily accessible, you should addHF_ENDPOINT=https://hf-mirror.comto your~/.bashrcor~/.zshrcdepending on which$SHELLyour system uses.
SimpleTuner now includes automatic GPU detection and configuration through the WebUI. Upon first load, you'll be guided through an onboarding step that detects your GPUs and configures Accelerate automatically.
When you first launch the WebUI or use simpletuner configure, you'll encounter an "Accelerate GPU Defaults" onboarding step that:
-
Automatically detects all available GPUs on your system
-
Shows GPU details including name, memory, and device IDs
-
Recommends optimal settings for multi-GPU training
-
Offers three configuration modes:
- Auto Mode (Recommended): Uses all detected GPUs with optimal process count
- Manual Mode: Select specific GPUs or set a custom process count
- Disabled Mode: Single GPU training only
How it works:
- The system detects your GPU hardware via CUDA/ROCm
- Calculates optimal
--num_processesbased on available devices - Sets
CUDA_VISIBLE_DEVICESautomatically when specific GPUs are selected - Saves your preferences for future training runs
If not using the WebUI, you can control GPU visibility directly in your config.json:
{
"accelerate_visible_devices": [0, 1, 2],
"num_processes": 3
}This will restrict training to GPUs 0, 1, and 2, launching 3 processes.
- If you are using
--report_to='wandb'(the default), the following will help you report your statistics:
wandb loginFollow the instructions that are printed, to locate your API key and configure it.
Once that is done, any of your training sessions and validation data will be available on Weights & Biases.
ℹ️ If you would like to disable Weights & Biases or Tensorboard reporting entirely, use
--report-to=none
- Launch training with simpletuner; logs will be written to
debug.log
simpletuner train
⚠️ At this point, if you usedsimpletuner configure, you are done! If not - these commands will work, but further configuration is required. See the tutorial for more information.
To run unit tests to ensure that installation has completed successfully:
python -m unittest discover tests/For users who train multiple models or need to quickly switch between different datasets or settings, two environment variables are inspected at startup.
To use them:
simpletuner train env=default config_backend=envenvwill default todefault, which points to the typicalSimpleTuner/config/directory that this guide helped you configure- Using
simpletuner train env=pixartwould useSimpleTuner/config/pixartdirectory to findconfig.env
- Using
config_backendwill default toenv, which uses the typicalconfig.envfile this guide helped you configure- Supported options:
env,json,toml, orcmdif you rely on runningtrain.pymanually - Using
simpletuner train config_backend=jsonwould search forSimpleTuner/config/config.jsoninstead ofconfig.env - Similarly,
config_backend=tomlwill useconfig.env
- Supported options:
You can create config/config.env that contains one or both of these values:
ENV=default
CONFIG_BACKEND=jsonThey will be remembered upon subsequent runs. Note that these can be added in addition to the multiGPU options described above.
A publicly-available dataset is available on Hugging Face Hub with approximately 10k images with captions as filenames, ready for use with SimpleTuner.
You can organize images in a single folder or neatly organize them into subdirectories.
Quality Requirements:
- No JPEG artifacts or blurry images - modern models will pick these up
- Avoid grainy CMOS sensor noise (will appear in all generated images)
- No watermarks, badges, or signatures (these will be learned)
- Movie frames generally don't work due to compression (use production stills instead)
Technical Specifications:
- Images optimally divisible by 64 (allows reuse without resizing)
- Mix square and non-square images for balanced capabilities
- Use varied, high-quality datasets for best results
SimpleTuner provides captioning scripts for mass-renaming files. Caption formats supported:
- Filename as caption (default)
- Text files with
--caption_strategy=textfile - JSONL, CSV, or advanced metadata files
Recommended captioning tools:
- InternVL2: Best quality but slow (small datasets)
- BLIP3: Best lightweight option with good instruction following
- Florence2: Fastest but some dislike outputs
Your maximum batch size depends on VRAM and resolution:
vram use = batch size * resolution + base_requirements
Key principles:
- Use highest batch size possible without VRAM issues
- Higher resolution = more VRAM = lower batch size
- If batch size 1 at 128x128 doesn't work, hardware is insufficient
When training with multiple GPUs, your dataset must be large enough for the effective batch size:
effective_batch_size = train_batch_size × num_gpus × gradient_accumulation_steps
Example: With 4 GPUs and train_batch_size=4, you need at least 16 samples per aspect bucket.
Solutions for small datasets:
- Use
--allow_dataset_oversubscriptionto auto-adjust repeats - Manually set
repeatsin your dataloader config - Reduce batch size or GPU count
See DATALOADER.md for complete details.
To automatically push models to Hub upon completion, add to config/config.json:
{
"push_to_hub": true,
"hub_model_name": "your-model-name"
}Login before training:
huggingface-cli loginEnable detailed logging by adding to config/config.env:
export SIMPLETUNER_LOG_LEVEL=DEBUG
export SIMPLETUNER_TRAINING_LOOP_LOG_LEVEL=DEBUGA debug.log file will be created in the project root with all log entries.