Skip to content

Commit 6547e2a

Browse files
committed
Adding some refactoring through folder adds/directories
1 parent 004c75a commit 6547e2a

File tree

7 files changed

+341
-51
lines changed

7 files changed

+341
-51
lines changed

README.md

Lines changed: 70 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,8 @@ pytest tests/ -v
8080

8181
For the training step, I used this file from here [Link text][https://github.com/qiuqiangkong/audioset_tagging_cnn/blob/master/README.md] that is a 16khz model for inference to finetune against.
8282

83-
For the example here, I set up a data folder at the top level with /data/train/ai and /data/train/real
84-
and would .mp3 and .wav files that I want to fintune against. I got the real data from
83+
For the example here, I set up a data folder at the top level with `/data/train/ai` and `/data/train/real`
84+
and would .mp3 and .wav files that I want to finetune against. I got the real data from
8585
FMA [Link Text][https://github.com/mdeff/fma] for testing, and the AI generated data from
8686
Facebook's Music Gen. There needs to be the word "ai" in the path of the ai folders and "real" in the
8787
path to the real songs.
@@ -90,8 +90,8 @@ path to the real songs.
9090
so gzip your file beforehand**
9191

9292
Steps:
93-
1. First place files in audio-processing-ai/data/train (if you are going to finetune data against your model)
94-
**All AI Files should go in the /data/train/ai and all of the real files goes in /data/train/real. This is because we need to do supervised learning befor training the classfier which file is AI music and which is Real**
93+
1. First place files in `data/train/` (if you are going to finetune data against your model)
94+
**All AI Files should go in `/data/train/ai` and all of the real files goes in `/data/train/real`. This is because we need to do supervised learning before training the classifier which file is AI music and which is Real**
9595
2. Figure out the model you are going to finetune against
9696
3. Update this line (PRETRAINED_MODEL_PATH = 'model/pretrained/pretrained_models/Cnn14_16k_mAP=0.438.pth.gz') at cnn14.py to the .pth.gz file location of your choice
9797

@@ -114,28 +114,86 @@ Optional arguments:
114114

115115
### Inference
116116

117-
To run predictions on audio files:
117+
Place your audio files in the `data/predict/` folder, then run predictions:
118+
118119
```bash
119120
python predict.py \
120-
--folder path/to/audio/files \
121-
--model model/saved_models/your_model.pth
121+
--folder data/predict \
122+
--model model/saved_models/your_model.pth \
123+
--output results
124+
```
125+
126+
Or specify any folder containing audio files:
127+
128+
```bash
129+
python predict.py \
130+
--folder path/to/your/audio/files \
131+
--model model/saved_models/your_model.pth \
132+
--output results
122133
```
123134

124135
Required arguments:
125-
- `--folder`: Directory containing .mp3/.wav files to analyze
136+
- `--folder`: Directory containing .mp3/.wav/.flac/.m4a files to analyze
126137
- `--model`: Path to your trained model (.pth file)
138+
- `--output`: Output directory for CSV and Excel results
127139

128140
The script will:
129141
1. Process each audio file in the specified folder
130142
2. Generate predictions for AI-generated content and audio scene tags
131-
3. Save results to a CSV file named `predictions_YYYYMMDD_HHMM.csv`
143+
3. Save results to a CSV file named `predictions_YYYYMMDD_HHMM.csv` and an Excel file with summary statistics
144+
145+
### Evaluation
146+
147+
To evaluate your trained model on a test set, use the `evaluation_pipeline.py` script. This will generate comprehensive metrics including ROC curves, confusion matrices, and threshold analysis.
148+
149+
**Prerequisites:**
150+
- Your data should be split into train/val/test folders
151+
- Test folder should contain `real/` and `ai/` subfolders with audio files
152+
153+
Example evaluation:
154+
```bash
155+
python evaluation_pipeline.py \
156+
--model model/saved_models/your_model.pth \
157+
--data-split data/split/ \
158+
--output-dir evaluation_results
159+
```
160+
161+
The evaluation will generate:
162+
- `evaluation_plots.png`: ROC curve, logit distributions, threshold analysis, and confusion matrix
163+
- `evaluation_thresholds.csv`: Performance metrics across different thresholds
164+
- `evaluation_report.txt`: Detailed text report with all metrics
165+
166+
**Example Output:**
167+
See `sample_runs/eval_sample_run/` for an example of evaluation results. This directory contains:
168+
- `evaluation_plots.png`: Visual performance metrics
169+
- `evaluation_report.txt`: Detailed evaluation report
170+
- `evaluation_thresholds.csv`: Threshold analysis data
171+
172+
Required arguments:
173+
- `--model`: Path to trained model (.pth file)
174+
- `--data-split`: Path to split data folder (containing train/val/test subdirs)
175+
176+
Optional arguments:
177+
- `--output-dir`: Output directory for results (default: auto-generated with timestamp)
178+
- `--seed`: Random seed for reproducibility (default: 42)
132179

133180
## Project Structure
134181

135182
```
136183
audio-processing-ai/
137184
├── .github/
138185
│ └── workflows/ # GitHub Actions CI/CD workflows
186+
├── data/
187+
│ ├── train/ # Training data
188+
│ │ ├── ai/ # AI-generated audio files
189+
│ │ └── real/ # Real audio files
190+
│ └── predict/ # User audio files for prediction
191+
├── model/
192+
│ ├── pretrained/ # Pretrained model weights
193+
│ │ └── pretrained_models/
194+
│ └── saved_models/ # Trained model checkpoints
195+
├── sample_runs/ # Example evaluation outputs
196+
│ └── eval_sample_run/ # Sample evaluation results
139197
├── src/
140198
│ └── audio_processing_ai/ # Main package
141199
│ ├── dataset/ # Dataset loading and processing utilities
@@ -145,6 +203,7 @@ audio-processing-ai/
145203
├── tests/ # Test files
146204
├── train.py # Training script
147205
├── predict.py # Prediction script
206+
├── evaluation_pipeline.py # Model evaluation script
148207
├── pyproject.toml # Package configuration
149208
├── uv.lock # uv lock file (if using uv)
150209
├── .pre-commit-config.yaml # Pre-commit hooks configuration
@@ -160,7 +219,9 @@ audio-processing-ai/
160219
- Audio processing is done using torchaudio and librosa
161220
- Model architecture is based on CNN14 with dual-head classification
162221
- Training data should be organized in the `data/train/` directory
222+
- Prediction files can be placed in `data/predict/` for convenience
163223
- Model checkpoints are saved in `model/saved_models/`
224+
- Evaluation results are saved with timestamps for easy tracking
164225
- The project is structured as a proper Python package following modern packaging standards
165226
- All modules are organized under `src/audio_processing_ai/` for better code organization
166227
- Uses `uv` for fast dependency management (recommended) or `pip` as an alternative

data/.gitkeep

Whitespace-only changes.

data/predict/.gitkeep

Whitespace-only changes.

eval_output_dir/evaluation_thresholds.csv

Lines changed: 0 additions & 42 deletions
This file was deleted.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)