Skip to content

Latest commit

 

History

History
154 lines (126 loc) · 6.01 KB

File metadata and controls

154 lines (126 loc) · 6.01 KB

SEED Training and Evaluation Dataset Preparation Guide

This document guides you through the process of preparing the datasets required for training and evaluating the SEED model.

cd ./datasets # Current path: ./datasets

1. Download and Prepare Training Datasets

First, download and organize the speech datasets and supplementary data (noise, reverberation, etc.) needed for model training.

# 1.1. Download Libri-Light (https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md)
mkdir -p ./Libri-Light
cd ./Libri-Light
wget https://dl.fbaipublicfiles.com/librilight/data/small.tar
tar -xvf small.tar
rm -rf small.tar
cd ../

# 1.2. Download LibriTTS-R (https://www.openslr.org/141/)
mkdir -p ./LibriTTS-R
cd ./LibriTTS-R
wget https://www.openslr.org/resources/141/train_clean_100.tar.gz
wget https://www.openslr.org/resources/141/train_clean_360.tar.gz
tar -xvf train_clean_100.tar.gz
rm -rf train_clean_100.tar.gz
tar -xvf train_clean_360.tar.gz # Adjust if the filename is train-clean-360.tar.gz
rm -rf train_clean_360.tar.gz # Adjust if the filename is train-clean-360.tar.gz
cd ../

# 1.3. Download MUSAN (Music, Noise, Speech sound effects) (https://www.openslr.org/17/)
wget https://www.openslr.org/resources/17/musan.tar.gz
tar -xvf musan.tar.gz
rm -rf musan.tar.gz

# 1.4. Download RIRs (Room Impulse Responses) (https://www.openslr.org/28/)
wget https://www.openslr.org/resources/28/rirs_noises.zip
unzip rirs_noises.zip -d rirs_noises
rm -rf rirs_noises.zip
mkdir -p ./RIRS_NOISES
mv rirs_noises/simulated_rirs/* ./RIRS_NOISES/ # Ensure files are moved into RIRS_NOISES
rm -rf rirs_noises

Expected directory structure after downloading training datasets:

datasets/
├── Libri-Light/small/
├── LibriTTS-R/
│   ├── train_clean_100/
│   └── train_clean_360/  
├── musan/
│   ├── music/
│   ├── noise/
│   └── speech/
└── RIRS_NOISES/

2. Preprocess Training Datasets

Generate the data list file required for SEED model training using the downloaded training datasets.

# 2.1. Create data list for SEED training
python ./make_seed_trainset.py \
    --target_dirs ./Libri-Light/small/ \
                  ./LibriTTS-R/train_clean_100/ \
                  ./LibriTTS-R/train_clean_360/ \
    --output_filename ./manifests/train_libritts+light_1000h.txt \
    --extensions .wav

Note: You might need to convert the audio format of the training data to 16kHz, mono channel. This README does not provide a script execution command for this conversion. If necessary, you should use a separate script or prepare it (e.g., using sox or ffmpeg). We use ffmpeg -y -i "input_path" -ac 1 -ar 16000 -acodec pcm_s16le command to convert the audio format.

3. Download and Prepare Evaluation Datasets

Prepare the VoxCeleb1 and VoxConverse test datasets that will be used for model evaluation.

# 3.1. Download VoxCeleb1
# Download and prepare the VoxCeleb1 dataset according to the instructions at the following link:
# https://github.com/clovaai/voxceleb_trainer
# After downloading, it should be located in the ./voxceleb1/ directory.

# 3.2. Create VCMix Test Set (using VoxCeleb1 and VoxConverse)
# The VCMix dataset combines the VoxCeleb1 and VoxConverse test SV datasets.
# Related paper: "Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification" (https://arxiv.org/abs/2309.14741)
# The script below uses the VCMix dataset generation code provided by the first author (Hee-Soo Heo) of the paper.

python ./make_vcmix_testset.py \
    --voxceleb1_path   ./voxceleb1/ \
    --voxconverse_path ./voxconverse_test_SV/ \
    --voxconverse_test ./voxconverse_test_SV/trials_wo_overlap.txt \
    --output_filename  ./manifests/vcmix_test.txt \
    --download_voxconverse_test_SV  # Using this option will automatically download and prepare the VoxConverse test SV dataset.

Note: The --download_voxconverse_test_SV option is optional. If you do not have the VoxConverse dataset, using this option to download the re-prepared VoxConverse test SV dataset is a fast way to generate the vcmix_test.txt file.

Expected directory structure (partial) after preparing evaluation datasets:

datasets/
├── Libri-Light/small/
├── LibriTTS-R/
│   ├── train-clean-100/
│   └── train_clean_360/
├── musan/
│   ├── music/
│   ├── noise/
│   └── speech/
├── RIRS_NOISES/
├── voxceleb1/                     # Location of VoxCeleb1 data
├── voxconverse_test_SV/           # Location of VoxConverse data
└── manifests/
    ├── train_libritts+light_1000h.txt
    ├── vox1-O.txt                 # VoxCeleb1 evaluation list (can be generated by make_vcmix_testset.py or a separate script)
    └── vcmix_test.txt

4. Consolidate Evaluation Datasets (Using Symbolic Links)

To easily use various evaluation datasets within the code, consolidate the evaluation dataset directories into a single common directory (vox1-evals) using symbolic links.

sh ./merge_eval_directories.sh

5. Final Directory Structure Check

After completing all steps, your datasets directory should have the following structure:

datasets/
├── Libri-Light/small/          # For training
├── LibriTTS-R/                 # For training
│   ├── train-clean-100/
│   └── train_clean_360/
├── musan/                      # For training (data augmentation)
│   ├── music/
│   ├── noise/
│   └── speech/
├── RIRS_NOISES/                # For training (data augmentation)
├── voxceleb1/                  # For evaluation
├── voxconverse_test_SV/        # For evaluation
├── vox1-evals/                 # For evaluation (directory consolidated with symbolic links)
└── manifests/                  # Data list files
    ├── train_libritts+light_1000h.txt
    ├── vox1-O.txt
    └── vcmix_test.txt