ACE-Step models are organized as subdirectories under a root checkpoints/ folder. Each model variant has its own folder with a config.json and weight files:
checkpoints/
acestep-v15-turbo/ # Turbo (8-step accelerated)
config.json
modeling.py
model.safetensors
silence_latent.pt
acestep-v15-base/ # Base (full diffusion)
config.json
...
acestep-v15-sft/ # SFT (supervised fine-tune)
config.json
...
vae/ # VAE encoder/decoder (shared)
Qwen3-Embedding-0.6B/ # Text encoder (shared)
The model loader uses AutoModel.from_pretrained() with trust_remote_code=True. This means:
- The
config.jsontells HuggingFace Transformers which Python class to load - The
modeling.pyfile in each folder defines the actual model architecture - Renaming folders breaks the loading mechanism
If you downloaded weights manually, make sure the folder names match what the model expects.
Side-Step's model discovery automatically classifies models:
- Official models: Folder names starting with
acestep-v15-(e.g.,acestep-v15-turbo). These are auto-detected with correct timestep parameters. - Custom models / fine-tunes: Any other folder containing a
config.json. Side-Step will ask which base model they descend from.
When you start training or preprocessing, the wizard:
- Scans your checkpoint directory for subfolders with
config.json - Filters out non-model directories (VAE, text encoder, etc.)
- Labels each as official or custom
- Presents a numbered list for selection
- Offers fuzzy search if you have many models
You can also type a name (or part of one) to filter the list.
Side-Step supports training on community fine-tunes, not just the official turbo/base/sft.
- You MUST have the original base model that the fine-tune was built from. The training loop needs it for correct timestep conditioning.
- The fine-tune folder must contain a valid
config.json(same format as official models). - Do not rename the fine-tune folder.
- The model picker lists your fine-tune alongside official models
- If the fine-tune's
config.jsondoesn't specifyis_turboor timestep parameters, Side-Step asks: "Which base model was this fine-tune trained from?" - Your answer conditions the training: turbo uses discrete scheduling, base/sft use continuous
- Training proceeds normally
uv run python train.py fixed \
--checkpoint-dir ./checkpoints \
--model-variant my-custom-finetune \
--base-model turbo \
--dataset-dir ./my_tensors \
--output-dir ./output/my_loraThe --model-variant accepts any folder name under --checkpoint-dir.
The --base-model tells Side-Step which timestep parameters to use.
Understanding which base model you're working with matters for training quality:
| Base | is_turbo |
Timestep Sampling | CFG | Best For |
|---|---|---|---|---|
| Turbo | Yes | Discrete (8-step) | Not trained with CFG | Fast generation, the original training script was built for this |
| Base | No | Continuous (logit-normal) | Trained with CFG dropout | Full quality, benefits most from corrected training |
| SFT | No | Continuous (logit-normal) | Trained with CFG dropout | Instruction-following generation |
Side-Step's corrected (fixed) training mode uses continuous timestep sampling and CFG dropout -- matching how base and SFT models were actually trained. For turbo, the original discrete schedule is appropriate.
Some checkpoint directories are shared across all model variants:
vae/-- The audio VAE (AutoencoderOobleck). Encodes raw audio into latent space and decodes back.Qwen3-Embedding-0.6B/-- The text encoder. Converts text prompts into embeddings for conditioning.silence_latent.pt-- Pre-computed silence latent used for LoRA preprocessing context.
These are loaded separately during preprocessing and training. They don't need to be inside each model variant's folder.
- [[Getting Started]] -- Installation and first-run setup
- [[Training Guide]] -- Start training adapters