HH-Codec/dataset at main · opendilab/HH-Codec

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
README_zh.md	README_zh.md
eval_prepare.py	eval_prepare.py
hubert_mel.py	hubert_mel.py

Name

Last commit message

Last commit date

Data Preprocessing

To train the HH-Codec, the first step is to download the dataset. We recommend using the following training datasets:

cd HH-Codec/dataset
wget https://openslr.magicdatatech.com/resources/12/train-clean-100.tar.gz
wget https://openslr.magicdatatech.com/resources/12/train-clean-360.tar.gz
wget https://openslr.magicdatatech.com/resources/12/test-clean.tar.gz
wget https://openslr.magicdatatech.com/resources/12/test-other.tar.gz
wget https://datashare.ed.ac.uk/download/DS_10283_3443.zip
wget https://datashare.ed.ac.uk/download/DS_10283_2651.zip

You can extract semantic teacher representations directly from raw audio waveforms to 'HH-Codec/dataset'. We provide a script to demonstrate how to obtain HuBERT representations.

cd /root/code/HH-Codec && conda activate codec
export REP_PATH="dataset/Hubert"
python dataset/hubert_mel.py --dataset_name "libritts_train_clean_360" --dataset_path "dataset/LibriSpeech/train-clean-360"
python dataset/hubert_mel.py --dataset_name "libritts_train_clean_100" --dataset_path "dataset/LibriSpeech/train-clean-100"
python dataset/hubert_mel.py --dataset_name "LJspeech" --dataset_path "dataset/LJSpeech-1.1"
python dataset/hubert_mel.py --dataset_name  "" --dataset_path ""

After running the script, a file will be generated at:

dataset/Hubert/libritts_train_clean_100.txt

Each line maps the original audio file path to its corresponding HuBERT embedding location.

Eval Data Preprocess

cd /root/code/HH-Codec && conda activate codec
python dataset/eval_prepare.py --wavtext "dataset/eval/test_clean.txt" --dataset_path "/mnt/nfs3/zhangjinouwen/dataset/LibriTTS/test-clean"
python dataset/eval_prepare.py --wavtext "dataset/eval/test_other.txt" --dataset_path "/mnt/nfs3/zhangjinouwen/dataset/LibriTTS/test-other"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Data Preprocessing

Eval Data Preprocess

FilesExpand file tree

dataset

Directory actions

More options

Directory actions

More options

Latest commit

History

dataset

Folders and files

parent directory

README.md

Data Preprocessing

Eval Data Preprocess