Skip to content

4.3 Integrate our Segmentation model and instrument classifier to automatically label film clips #9

@hughmancoder

Description

@hughmancoder

Adopt a CRNN-based architecture (Convolutional Recurrent Neural Network) from the following paper and reproduce experiment

Onset Detection for String Instruments Using Bidirectional Temporal and Convolutional Recurrent Networks

Training Process

  • Pre-train the model on QTDS/Böck dataset.
  • Fine-tune on the curated martial-arts + Jingju dataset.
  • Convert audio to logarithmic spectrograms using:
  • Hanning window with 11.6 ms hop size [29]
  • Use Librosa for spectrogram generation.

training

  • Train CRNN for frame-wise onset classification using specifications from:
  • “Onset Detection for String Instruments Using Bidirectional Temporal and Convolutional Recurrent Networks” [29].

Model Evaluation

  • Use F1-score as primary evaluation metric.
  • Apply data split: 70% train / 15% validation / 15% test.
  • Merge detected onsets within 30 ms into a single event.
  • Count an onset/offset as correct if within ±50 ms of ground truth.

Metadata

Metadata

Labels

No labels
No labels

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions