This research has its purpose in accelerating the usage of symbolic representation of music through showing its potential in improving accuracy and efficiency in recognizing and handling musical information. Classification task, whether it's concerning genre, style, era, or composer as each category, is a relatively common in the music domain. To the best of our knowledge, all the previous works have been mostly attempted using audio/wav features, such as MFCC, mel-spectrogram. However, using only audio/wav format to detect, analyze and maintain music has its limitations. Through this research, we show that using symbolic representation(Midi, music piece, etc) can strengthen certain aspects of music detection and classification, which is often the basis of many tasks in music industry, and suggest symbolic representation as a possible solution to the current limitations and problems present in working with audio/wav format. Furthermore, we went ahead and tested serveral adversarial attack techniques on this format of music.
- Python >= 3.6.9
- PyTorch = 1.4.0
- torchvision =
- py_midicsv = 1.10.0
- numpy = 1.18.1
- sklearn =
- matplotlib = 3.1.3
- music21 = 5.7.2
- tqdm
python main.py --gpu [gpu to use]
--mode basetrain
--model_name [model to use]
--epochs [epoch #]
--optim [optimizer to use]
--transform [Transpose / Tempo]
--load_path ['/PATH/TO/TRAIN.TXT_AND_VALID.TXT/']
--save_path ['/PATH/TO/SAVE/MODEL_AND_LOADER/']
python main.py --gpu [gpu to use]
--mode advtrain
--model_name [model to use]
--epochs [epoch #]
--optim [optimizer to use]
--transform [Transpose / Tempo]
--input_path ['/PATH/TO/ATTACKED/INPUT/']
--save_path ['/PATH/TO/SAVE_MODEL_AND_LOADER/']
python main.py --gpu [gpu to use]
--mode attack
--load_path [/PATH/TO/SAVED_LOADER/]
--save_path [/PATH/TO/SAVE/ATTACK_EXAMPLES/]
--epsilons ['ep0, ep1, ep2, ... epn']
--save_atk [True/False]
tensorboard --logdir=trainlog
python main.py --gpu 0
--mode basetrain
--model_name resnet50
--epochs 100 --optim SGD
--load_path '/data/split/'
--save_path '/data/drum/dataset/'
python main.py --gpu 3
--mode attack
--load_path '/data/drum/bestmodel/'
--save_path '/data/attacks/'
--epsilons '0.05, 0.1, 0.2, 0.4, 0.6'
--save_atk True
MAESTRO: (MIDI and Audio Edited for Synchronous TRacks and Organization) is a dataset composed of over 200 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms.
Specifically, we used v2.0.0 version of the dataset. Although the big advantage of using this dataset is the fine alignment between midi & audio, we only utilize the midi data for this experiment, for the audio part is unecessary for the classification of symbolic music.
For the usage, please refer to the "Download" section of the official website.
Downloaded MAESTRO Midi dataset was preprocessed using music21, a toolkit for computer-aided musicology distributed by MIT. Preprocessing takes the following steps:
1. Remove composers with too small number of data.
2. Extract notes from each track
3. Divide into 0.05 second units
4. Mark note information on 3d matrix
Uneven Distribution of data
Remaining 14 composers after step(1)
Generated input takes the form of (2, 400, 128), where:
- 2 channel = onset + note
- channel[0] (onset) = binary
- channel[1] (note) = 0-128 velocity
- 400 (x-dim) = time (0.05 sec)
- 128 (y-dim) = 0-127 pitch
Model: ResNet-50




