This guide explains how to use cli.py, the interactive wizard and config-driven CLI for ACE-Step inference.
The CLI is wizard/config only: you either run the wizard to build a config, or load a .toml config and generate.
Generate via wizard (interactive):
python cli.pyGenerate from a saved config:
python cli.py --config config.tomlCreate or edit a config without generating:
python cli.py --configure
python cli.py --configure --config config.toml-c/--config— Path to a.tomlconfiguration file to load.--configure— Run wizard to save configuration without generating.--log-level— Logging level for internal modules. One ofTRACE,DEBUG,INFO,WARNING,ERROR,CRITICAL. Default:INFO.
- Choose one of 6 tasks.
- Select a DiT model (from locally available models, or auto-download).
- Select an LM model (from locally available models, or auto-download).
- Provide task-specific inputs (source audio, tracks, etc.).
- For
text2music: choose between Simple Mode (auto-generate caption/lyrics via LM) or manual input. - Provide caption / description.
- Choose lyrics mode (instrumental / auto-generate / file / paste).
- Set number of outputs.
- Optionally configure advanced parameters (metadata, DiT settings, LM settings, output settings).
- Review summary and confirm generation.
- Save configuration to a
.tomlfile.
If you skip advanced parameters, the wizard fills all optional parameters with defaults from GenerationParams and GenerationConfig.
--configure runs the wizard without generation and always saves a config.
Behavior:
- If
--configis provided, the file is loaded and used as the wizard's starting values. - After the wizard, you choose a filename to save (overwriting or new).
- The program exits without generation.
The wizard saves a .toml file containing all parameters. These keys map directly to the fields used in cli.py.
When you load a config with --config, all keys are applied to the runtime settings.
When thinking=True and a config file is loaded via --config, the CLI looks for an instruction.txt file in the project root. If found, its contents are used as the pre-loaded formatted prompt for LM audio-token generation, bypassing the interactive editing step.
When running without a config file (wizard mode), the CLI writes the LM's formatted prompt to instruction.txt and pauses so you can edit it before audio-token generation proceeds.
This allows fine-tuning the exact prompt (caption, lyrics, metadata) that the LM sees before generating audio codes.