This document guides you through the process of preparing the datasets required for training and evaluating the SEED model.
cd ./datasets # Current path: ./datasetsFirst, download and organize the speech datasets and supplementary data (noise, reverberation, etc.) needed for model training.
# 1.1. Download Libri-Light (https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md)
mkdir -p ./Libri-Light
cd ./Libri-Light
wget https://dl.fbaipublicfiles.com/librilight/data/small.tar
tar -xvf small.tar
rm -rf small.tar
cd ../
# 1.2. Download LibriTTS-R (https://www.openslr.org/141/)
mkdir -p ./LibriTTS-R
cd ./LibriTTS-R
wget https://www.openslr.org/resources/141/train_clean_100.tar.gz
wget https://www.openslr.org/resources/141/train_clean_360.tar.gz
tar -xvf train_clean_100.tar.gz
rm -rf train_clean_100.tar.gz
tar -xvf train_clean_360.tar.gz # Adjust if the filename is train-clean-360.tar.gz
rm -rf train_clean_360.tar.gz # Adjust if the filename is train-clean-360.tar.gz
cd ../
# 1.3. Download MUSAN (Music, Noise, Speech sound effects) (https://www.openslr.org/17/)
wget https://www.openslr.org/resources/17/musan.tar.gz
tar -xvf musan.tar.gz
rm -rf musan.tar.gz
# 1.4. Download RIRs (Room Impulse Responses) (https://www.openslr.org/28/)
wget https://www.openslr.org/resources/28/rirs_noises.zip
unzip rirs_noises.zip -d rirs_noises
rm -rf rirs_noises.zip
mkdir -p ./RIRS_NOISES
mv rirs_noises/simulated_rirs/* ./RIRS_NOISES/ # Ensure files are moved into RIRS_NOISES
rm -rf rirs_noisesExpected directory structure after downloading training datasets:
datasets/
├── Libri-Light/small/
├── LibriTTS-R/
│ ├── train_clean_100/
│ └── train_clean_360/
├── musan/
│ ├── music/
│ ├── noise/
│ └── speech/
└── RIRS_NOISES/
Generate the data list file required for SEED model training using the downloaded training datasets.
# 2.1. Create data list for SEED training
python ./make_seed_trainset.py \
--target_dirs ./Libri-Light/small/ \
./LibriTTS-R/train_clean_100/ \
./LibriTTS-R/train_clean_360/ \
--output_filename ./manifests/train_libritts+light_1000h.txt \
--extensions .wavNote: You might need to convert the audio format of the training data to 16kHz, mono channel. This README does not provide a script execution command for this conversion. If necessary, you should use a separate script or prepare it (e.g., using sox or ffmpeg). We use ffmpeg -y -i "input_path" -ac 1 -ar 16000 -acodec pcm_s16le command to convert the audio format.
Prepare the VoxCeleb1 and VoxConverse test datasets that will be used for model evaluation.
# 3.1. Download VoxCeleb1
# Download and prepare the VoxCeleb1 dataset according to the instructions at the following link:
# https://github.com/clovaai/voxceleb_trainer
# After downloading, it should be located in the ./voxceleb1/ directory.
# 3.2. Create VCMix Test Set (using VoxCeleb1 and VoxConverse)
# The VCMix dataset combines the VoxCeleb1 and VoxConverse test SV datasets.
# Related paper: "Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification" (https://arxiv.org/abs/2309.14741)
# The script below uses the VCMix dataset generation code provided by the first author (Hee-Soo Heo) of the paper.
python ./make_vcmix_testset.py \
--voxceleb1_path ./voxceleb1/ \
--voxconverse_path ./voxconverse_test_SV/ \
--voxconverse_test ./voxconverse_test_SV/trials_wo_overlap.txt \
--output_filename ./manifests/vcmix_test.txt \
--download_voxconverse_test_SV # Using this option will automatically download and prepare the VoxConverse test SV dataset.Note: The
--download_voxconverse_test_SVoption is optional. If you do not have the VoxConverse dataset, using this option to download the re-prepared VoxConverse test SV dataset is a fast way to generate thevcmix_test.txtfile.
Expected directory structure (partial) after preparing evaluation datasets:
datasets/
├── Libri-Light/small/
├── LibriTTS-R/
│ ├── train-clean-100/
│ └── train_clean_360/
├── musan/
│ ├── music/
│ ├── noise/
│ └── speech/
├── RIRS_NOISES/
├── voxceleb1/ # Location of VoxCeleb1 data
├── voxconverse_test_SV/ # Location of VoxConverse data
└── manifests/
├── train_libritts+light_1000h.txt
├── vox1-O.txt # VoxCeleb1 evaluation list (can be generated by make_vcmix_testset.py or a separate script)
└── vcmix_test.txt
To easily use various evaluation datasets within the code, consolidate the evaluation dataset directories into a single common directory (vox1-evals) using symbolic links.
sh ./merge_eval_directories.shAfter completing all steps, your datasets directory should have the following structure:
datasets/
├── Libri-Light/small/ # For training
├── LibriTTS-R/ # For training
│ ├── train-clean-100/
│ └── train_clean_360/
├── musan/ # For training (data augmentation)
│ ├── music/
│ ├── noise/
│ └── speech/
├── RIRS_NOISES/ # For training (data augmentation)
├── voxceleb1/ # For evaluation
├── voxconverse_test_SV/ # For evaluation
├── vox1-evals/ # For evaluation (directory consolidated with symbolic links)
└── manifests/ # Data list files
├── train_libritts+light_1000h.txt
├── vox1-O.txt
└── vcmix_test.txt