A repository for comparing potential speaker diarization tools to be used in the MEXCA pipeline.
The repository contains subdirectories for different parts of the experiment:
speaker-diarization\: Contains all files for the speaker diarization partembeddings\: Contains the encoded speaker embeddings as .pt filesresults\: Contains the .rttm files with speaker annotationsclustering.py: Script for clustering the speaker embeddings and assigning the speaker labels to speaker segmentssd_*.py: Scripts for applying the respective speaker encoding modelscompare_sd.ipynb: Notebook for comparing the speaker diarization approachesspeaker_diarization.py: Script to run all speaker encoding scripts after each otherspeaker_representation.py: Helper functions for performing speaker diarization
voice-activity-detection\: Contains all files for the voice activity detection partresults\: Contains the .rttm files with speech segmentscompare_vad.ipynb: Notebook for comparing the voice activity detection approachescustom.conf: Configuration file for the opensmile feature extractoropensmile_helper_functions: Helper functions for extracting opensmile voice activity featuresvad_*.py: Scripts for applying the voice activity detection models
explore_ami_corpus.ipynb: Notebook for exploring the properties of the AMI corpusrttm.py: Functions for creating, reading, modifying, and writing .rttm files and objectsrttm_test.py: Preliminary test suite forrttm.py