Generate clean, structured audio metadata.
*Extracts lyrics, tags & BPM from audio files – fully automated*
- 🧠 LLM-powered Tag Generator – (genre, moods, bpm, key, instruments, vocals and rap style)
- 🎙️ Lyric Detection – automatically via Genius.com
- 🕺 BPM Analysis – via Librosa
- 🖥️ Modern WebUI – with mood slider, genre presets & custom prompt field
- 🗂️ Export to ACE-Step training format
- 🔁 Retry logic & logging built-in
| Component | Recommended |
|------------|---------------|
| OS | Windows 10 Pro|
| GPU | 12 GB VRAM |
| RAM | 32 GB |
| Python | 3.11 |
| CUDA | 12.9 |
| Model | `Qwen2-Audio-7B-Instruct`|
-
Install NVIDIA Video Driver:
- You should install the latest version of your GPUs driver. Download drivers here: NVIDIA GPU Drive.
-
Install CUDA Toolkit:
- Follow the instructions to install CUDA Toolkit.
-
Install PyTorch:
- Install
torch
andtriton
. - Go to https://pytorch.org to install it. For example
pip install torch torchvision torchaudio triton
- You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully.
- Install PyTorch.
- Confirm if CUDA is installed correctly. Try
nvcc
. If that fails, you need to installcudatoolkit
or CUDA drivers.
- Install
-
Install BitsandBytes:
- Install
bitsandbytes
and check it withpython -m bitsandbytes
- Install
Conda Installation (recommended)
conda create --name acedata python=3.11
conda activate acedata
Install Pytorch
pip install torch==2.7.1+cu126 torchvision==0.22.1+cu126 torchaudio==2.7.1+cu126 --index-url https://download.pytorch.org/whl/cu126
Clone the repository
git clone https://github.com/methmx83/Ace-Step_Data-Tool.git
cd Ace-Step_Data-Tool
Install dependencies
pip install -e .
Launch the WebUI
conda activate acedata
acedata
Alternative
conda activate acedata
python start.py
Open WebUI [http://localhost:7860]
Content of a _prompt.txt
When the pipeline processes an audio file, a _prompt.txt
is created next to the file. It contains a simple, comma-separated list of tags. Example:
pop, 114 bpm, electronic, minor, sad, piano, synth pad, female vocal
- Multi-category tagging:
genre
,key
(major/minor),mood
,instruments
,vocal
, andvocal_fx
(e.g.,autotune
,harmony
,pitch-up
). - Configurable prompts in
config/prompts.json
- Content-based retry per category (configurable) and audio caching / multi-segment processing.
- Function:
detect_tempo(audio_path: str) -> Optional[float]
detects the tempo and returns a number on success. - Integration: The pipeline calls the detection before prompt/tag generation and adds a normalized tag in the format
XXX bpm
to the generated_prompt.txt
files.
- The lyrics are extracted as plain text (using
Requests
+BeautifulSoup4
) and saved in a file<Name>_lyrics.txt
.
- By default, the tool expects a folder (
data/audio
in the project directory) containing audio files (supported:.mp3
,.wav
,.flac
,.m4a
). - All files (recursively in subfolders) are read and processed one after another. Intermediate results and logs are displayed for each track.
ContextExtractor
reads Artist/Title from filename.SegmentPlanner
plans segments according toworkflow_config
and caches the union viaAudioProcessor
.PromptBuilder
generates system+user prompts per category.InferenceRunner
calls the model (multiple audio paths per category possible), including technical and content-based retries.TagPipeline
extracts raw tags per category, normalizes against the whitelist (inpresets/moods.md
), applies Min/Max/Order/Overall limits, and resolves conflicts.- Orchestrator writes final tags as
*_prompt.txt
next to the audio file.
config/prompts.json
— Prompt templates andworkflow_config.default_categories
(standard now includeskey
andvocal_fx
).workflow_config.audio_segments
— e.g.,["best","middle"]
(are cached).output_format.min_tags_per_category
/max_tags_per_category
— Min/Max per category.
presets/moods.md
contains the allowed tags forgenres
,moods
,instruments
,vocal types
,keys
, andvocal_fx
.- New:
presets/hiphop/moods.md
is an example preset for Hip-Hop-specific tags. Select it in the UI or via--moods_file presets/hiphop/moods.md
in the CLI run.
- Missing tags: Check logs (console + log file). The parser attempts several fallbacks: JSON objects, arrays, code blocks, quoted JSON, and heuristic text search.
- Web scraping of sites like Genius may be subject to restrictions by their Terms of Service. Please check the legal situation before running automated scrapes on a large scale.
Apache-2.0 license
This repository and the included code are distributed under the Apache License, Version 2.0. The full license text is included in the LICENSE
file at the repository root.
Third-party components included in this project are documented in third_party/THIRD_PARTY_LICENSES.md
and NOTICE
. Several files and modules were derived from or inspired by other projects that are themselves licensed under Apache-2.0. Those original copyright notices and license headers are retained in the copied files where present.
- [ACE-Step] (https://github.com/ace-step/ACE-Step)
- [woctordho] (https://github.com/woct0rdho/ACE-Step)