References

The code contained in this work is an adaptation of: https://github.com/auspicious3000/SpeechSplit and https://github.com/NVIDIA/WaveGlow

License details for each can be found in autovc/LICENSE and Waveglow/LICENSE respectively.

Details of changes are outlined in the report: "Vocal Style-Transfer for Anti-Bias Interviews", Section 3.3

Line-by-line changes are contained in the Differences directory.

I verify that I am the sole author of the programmes contained in this archive, except where explicitly stated to the contrary. Merlin Lindsay, 02.09.2021

To Run Conversion

In terminal type:

chmod +x run_dual_inference.sh 
run_dual_inference.sh

Alternatively:

python dual_inference.py
-source=input_wavs/ -target=p270
-output_dir=Converted_wavs
-speech_split_conf=SpeechSplit/run/models/Full_C_8_16_R_8_4_P_8_32/hparams.json
-ss_g=SpeechSplit/run/models/Full_C_8_16_R_8_4_P_8_32/50000-G.ckpt
-waveglow_model=Waveglow/models/waveglow_256channels_universal_v5.pt
-waveglow_conf=Waveglow/config.json
-sigma=1 \

Dependencies

numpy
torch==1.0
tensorflow
numpy==1.13.3
inflect==0.2.5
librosa==0.6.0
scipy==1.0.0
tensorflow==2.5.0
tensorboardX==1.1
Unidecode==1.0.22
tqdm
pillow

Global

File	Description
dual_inference.py	Main script to perform audio-to-audio conversion. Obtains paths to convert and creates target dir. Loads configuration files. Instantiates SpeechSplit and WaveGlow inference models. Performs conversion on each source wav, using the style of the target. Saves converted wavs to output directory.
run_dual_inference.sh	Shell script to call dual_inference.py.

SpeechSplit

File	Description
inference.py	Contains inferencer object which runs the forward pass through SpeechSplit. Loads the saved models Generates spectrograms with WaveGlow's Mel2Samp object Creates embedding of the target voice Performs conversion, returning converted mel-spectrogram
audio.py	Utilities for processing audio.
audioRead.py	Contains function for generating speaker IDs.
data_loader.py	Creates batches of training data for training SpeechSplit.
hparams.json	Hyperparameters output with training run.
hparams.py	Hyperparameters to train next training run with.
LICENSE	MIT License provided by creators of SpeechSplit.
main.py	Generates data loader and training objects to run SpeechSplit training loop.
make_metadata.py	Separates training data into train and validation .pkl files.
make_spect_f0.py	Generates mel-specs using WaveGlow's Mel2Samp object, extracts F0 features from audio, and saves each as .npy format for each training instance.
model.py	Contains SpeechSplit models.
solver.py	The training loop for SpeechSplit.
TFhparams.py	Creates hyperparameter object.
utils.py	Utilities for processing mel-specs and F0 features.

WaveGlow

Pre-trained WaveGlow vocoder:

File	Description
inference.py	WaveGlow inferencer object, receiving mels from SpeechSplit. Denoises spectrogram Runs inference Converts to audio Saves .wav
mel2samp.py	Creates mel-spectrograms using Tacotron-2's STFT. This object is used to create training data, as well as extracting mel-specs during inference.
glow.py	Contains WaveGlow model, with loss function, and WaveNet affine coupling layers.
denoiser.py	Denoiser model for removing bias of WaveGlow from generated spectrogram
config.json	Configuration file of WaveGlow, specifying training, data and hyperparameter configs.
make_train_files.py	Creates two text files containing paths to wav files, split into training and test sets.
train.py	Training loop for WaveGlow.
distributed.py	Training loop for distributed training of WaveGlow across multiple GPUs.
models/	Directory containing pre-trained WaveGlow model. Pre-trained vocoder can be found at: https://drive.google.com/file/d/1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF/view
tacotron2/	Directory containing functions and objects from Tacotron-2 for mel-spectrogram generation.
LICENSE	BSD 3-Clause License. Copyright statement from Nvidia.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
SpeechSplit		SpeechSplit
Waveglow		Waveglow
.gitignore		.gitignore
README.md		README.md
dual_inference.py		dual_inference.py
eval_inferencer.py		eval_inferencer.py
glow.py		glow.py
run_dual_inference.sh		run_dual_inference.sh
show_mel.py		show_mel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

References

To Run Conversion

Dependencies

Global

SpeechSplit

WaveGlow

Pre-trained WaveGlow vocoder:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

References

To Run Conversion

Dependencies

Global

SpeechSplit

WaveGlow

Pre-trained WaveGlow vocoder:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages