Skip to content

Commit 6ae0852

Browse files
authored
Merge pull request ace-step#282 from WebChatAppAi/main
Fix Incorrect Data Processing Flow in TRAIN_INSTRUCTION.md
2 parents d612523 + 6e93273 commit 6ae0852

File tree

1 file changed

+88
-35
lines changed

1 file changed

+88
-35
lines changed

TRAIN_INSTRUCTION.md

Lines changed: 88 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,101 @@
11
# Training Instruction
22

33
## 1. Data Preparation
4-
1. First, check the format of data preparation in the `data` directory under the root directory of the project page.
5-
Prepare your audio. If you already have well-labeled audio, that's great.
6-
If you don't have labels, you can use the following prompt and utilize Qwen Omini to label your audio. The community welcomes contributions of better prompts, as well as annotation tools and UI.
74

8-
How to get an audio's reception?
9-
You can use `Qwen-Omini` https://chat.qwen.ai/ to describe an audio.
10-
Here we share the prompt we used.
5+
### Required File Format
6+
For each audio sample, you need **exactly 3 files** in the `data` directory:
7+
8+
1. **`filename.mp3`** - The audio file
9+
2. **`filename_prompt.txt`** - Audio characteristics (comma-separated tags)
10+
3. **`filename_lyrics.txt`** - Song lyrics (optional, but recommended)
11+
12+
### Example Data Structure
13+
```
14+
data/
15+
├── test_track_001.mp3
16+
├── test_track_001_prompt.txt
17+
└── test_track_001_lyrics.txt
18+
```
19+
20+
### File Content Format
21+
22+
#### `*_prompt.txt` - Audio Tags
23+
Simple comma-separated audio characteristics describing the sound, instruments, genre, mood, etc.
24+
25+
**Example:**
26+
```
27+
melodic techno, male vocal, electronic, emotional, minor key, 124 bpm, synthesizer, driving, atmospheric
28+
```
29+
30+
**Guidelines for creating prompt tags:**
31+
- Include **genre** (e.g., "rap", "pop", "rock", "electronic")
32+
- Include **vocal type** (e.g., "male vocal", "female vocal", "spoken word")
33+
- Include **instruments** actually heard (e.g., "guitar", "piano", "synthesizer", "drums")
34+
- Include **mood/energy** (e.g., "energetic", "calm", "aggressive", "melancholic")
35+
- Include **tempo** if known (e.g., "120 bpm", "fast tempo", "slow tempo")
36+
- Include **key** if known (e.g., "major key", "minor key", "C major")
37+
38+
#### `*_lyrics.txt` - Song Lyrics
39+
Standard song lyrics with verse/chorus structure.
40+
41+
**Example:**
42+
```
43+
[Verse]
44+
Lately I've been wondering
45+
Why do I do this to myself
46+
I should be over it
47+
48+
[Chorus]
49+
It makes me want to cry
50+
If you knew what you meant to me
51+
I wonder if you'd come back
52+
```
53+
54+
### ⚠️ Important Notes
55+
- **File naming is strict**: Must follow `filename.mp3`, `filename_prompt.txt`, `filename_lyrics.txt` pattern
56+
- **JSON files are NOT supported** - the converter only reads the simple text files above
57+
- **Complex multi-variant descriptions are NOT used** - only the simple comma-separated prompt format works
58+
59+
## 2. Convert to Huggingface Dataset Format
60+
61+
Run the following command to convert your data to the training format:
62+
63+
```bash
64+
python convert2hf_dataset.py --data_dir "./data" --repeat_count 2000 --output_name "zh_lora_dataset"
65+
```
66+
67+
**Parameters:**
68+
- `--data_dir`: Path to your data directory containing the MP3, prompt, and lyrics files
69+
- `--repeat_count`: Number of times to repeat your data (use higher values for small datasets)
70+
- `--output_name`: Name of the output dataset directory
71+
72+
### What the Converter Creates
73+
74+
The converter processes your files and creates a Huggingface dataset with these features:
1175

1276
```python
13-
sys_prompt_without_tag = """Analyze the input audio and generate 6 description variants. Each variant must be <200 characters. Follow these exact definitions:
14-
15-
1. `simplified`: Use only one most representative tag from the valid set.
16-
2. `expanded`: Broaden valid tags to include related sub-genres/techniques.
17-
3. `descriptive`: Convert tags into a sensory-rich sentence based *only on the sound*. DO NOT transcribe or reference the lyrics.
18-
4. `synonyms`: Replace tags with equivalent terms (e.g., 'strings' → 'orchestral').
19-
5. `use_cases`: Suggest practical applications based on audio characteristics.
20-
6. `analysis`: Analyze the audio's genre, instruments, tempo, and mood **based strictly on the audible musical elements**. Technical breakdown in specified format.
21-
* For the `instruments` list: **Only include instruments that are actually heard playing in the audio recording.** **Explicitly ignore any instruments merely mentioned or sung about in the lyrics.** Cover all audibly present instruments.
22-
7. `lyrical_rap_check`: if the audio is lyrical rap
23-
**Strictly ignore any information derived solely from the lyrics when performing the analysis, especially for identifying instruments.**
24-
25-
**Output Format:**
26-
```json
77+
Dataset Features:
2778
{
28-
"simplified": <str>,
29-
"expanded": <str>,
30-
"descriptive": <str>,
31-
"synonyms": <str>,
32-
"use_cases": <str>,
33-
"analysis": {
34-
"genre": <str list>,
35-
"instruments": <str list>,
36-
"tempo": <str>,
37-
"mood": <str list>
38-
},
39-
"lyrical_rap_check": <bool>
79+
'keys': string, # filename (e.g., "test_track_001")
80+
'filename': string, # path to MP3 file
81+
'tags': list[string], # parsed prompt tags as array
82+
'speaker_emb_path': string, # (empty, not used)
83+
'norm_lyrics': string, # full lyrics text
84+
'recaption': dict # (empty, not used)
4085
}
41-
"""
4286
```
4387

44-
## 2. Convert to Huggingface Dataset Format
45-
2. Run `python convert2hf_dataset.py --data_dir "./data" --repeat_count 2000 --output_name "zh_lora_dataset"`. (Since there is only one piece of sample data, it is repeated 2000 times. You can adjust it according to the size of your data.)
88+
**Example processed sample:**
89+
```python
90+
{
91+
'keys': 'test_track_001',
92+
'filename': 'data/test_track_001.mp3',
93+
'tags': ['melodic techno', 'male vocal', 'electronic', 'emotional', 'minor key', '124 bpm', 'synthesizer', 'driving', 'atmospheric'],
94+
'speaker_emb_path': '',
95+
'norm_lyrics': '[Verse]\nLately I\'ve been wondering\nWhy do I do this to myself...',
96+
'recaption': {}
97+
}
98+
```
4699

47100
## 3. Configure Lora Parameters
48101
Refer to `config/zh_rap_lora_config.json` for configuring Lora parameters.

0 commit comments

Comments
 (0)