A PyTorch implementation of QuartzNet, an End-to-End ASR on LJSpeech dataset.
Set preferred configurations in config.py and run ./run_docker.sh (don't forget about correct volume option)
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2 # download data
tar xjf LJSpeech-1.1 # unzip data
python train.pyYou will need to log in to your account in wandb.ai for monitoring logs.
Every 10'th checkpoint after 40'th epoch will be saved in model{epoch}.pth.
Set path_to_file with .wav in config, from_pretrained=True, then
python inference.pyThe result will be saved in path_to_file.txt