- Author : Aayush Agrawal (aaagraw@microsoft.com)
Author: Bruno Grande
The following instructions describe how I was able to install the dependencies for these Jupyter notebooks. I was running into a version conflict with pip install. I was able to resolve dependencies using uv pip install.
# Navigate to ModelTraining subdirectory
cd aifororcas-livesystem/ModelTraining
# Create a new conda environment with Python 3.8
conda create -n <env-name> python=3.8
conda activate <env-name>
# Install uv (better at resolving package version conflicts)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies using `uv pip` (instead of plain `pip`)
uv pip install -r requirements.txtI'm also including a full list of installed packages and versions in requirements.lock.txt, which was generated using pip freeze.
The base data used here is hosted on Current test set for evaluation is hosted on Orca Sound website . The model was trained with the following dataset -
- WHOIS09222019_PodCastRound1 (~6hrs, open data source – Orca call around the planet, Good for generic models)
- OrcasoundLab07052019_PodCastRound2 (~1.2hrs, live hydrophone data - SRKW)
- OrcasoundLab09272017_PodCastRound3 (~1.6hrs, live hydrophone data - SRKW)
And testing in evaluation section of FastAI start script is done on this data -
- OrcasoundLab07052019_Test (~30min, very call-rich, good SNR conditions)
- OrcasoundLab09272017_Test (~21min, more typical distribution, good SNR conditions)
fastai/data
- train
- train_data_09222019
- wav
- train.tsv
- Round2_OS_07_05
- wav
- train.tsv
- Round3_OS_09_27_2017
- wav
- train.tsv
- mldata
- all
- models
- negative
- positive
- test
- all
- OrcasoundLab07052019_Test
- test2Sec
- wav
- test.tsv
- OrcasoundLab09272017_Test
- wav
- test.csv
Step 1 - OPTIONAL: Creating a more ML-ready dataset inline with other popular sound dataset - 1_DataCreation.ipynb notebook
- NOTE: For Quick Start, you may see *.wav files in train/mldata/all/[negative|positive] and in test/all/[negative|positive]; in this case, there's no need to run this script (i.e. no need to generate new samples)
- Extracting small audio segments for positive and negative label and store them in positive and negative folder for training (Filtering any call with less than 1 second duration)
- Also create 2 sec audio sample from testing data for formal ML evaluation
Step2: Transfer leaning a ResNet18 model with on the fly spectogram creation with frequency masking for data augmentation - 2_FastAI_StarterScript.ipynb
- Step 1: Create a Dataloader using FastAI library (Defining parameters to create a spectogram)
- Step 2: Create a DataBunch by splitting training data in 80:20% for validation using training phase, also defining augmentation technique (Frequency transformation)
- Step 3: Training a model using ResNet18 (pre-trained on ImageNet Data)
- Step 4: Running evaluation on Test set
- Step 5: Running scoring for official evaluation
Step3: Inference Testing3_InferenceTesting.ipynb
Notebook showing how to use the inference.py to load the model and do predictions by defining a clip path. The inference.py returns a dictionary -
- local_predictions: A list of predictions (1/0) for each second of wav file (list of integers)
- local_confidences: A list of probability of the local prediction being a whale call or not (list of float)
- global_predictions: A global prediction(1/0) for the clip containing a valid whale call (integer)
- global_confidences: Probability of the prediction being a whale call or not(float)
All the files generated or used during training are here:. NOTE: You must hold "Contributor" role assignment in the OrcaSound Azure tenant to access.
- Models/stg2-rn18.pkl - Trained ResNet18 model
- mldata.7z is data used for training the model
- All.zip is testing data for early evaluations
- test2Sec.zip is data used for scoring for official evaluations
- FastAIAudio
- FastAI v1
- [Librosa] (https://github.com/librosa/librosa)
- [Pandas] (https://github.com/pandas-dev/pandas)
- [Pydub] (https://github.com/jiaaro/pydub/)
- [tqdm] (https://github.com/tqdm/tqdm)
- sklearn