tinyVGAN is a lightweight neural vocoder inspired by BigVGANv2 and HiFi-GAN. It is trained on the LJSpeech-1.1 dataset and uses Multi-Period Discriminators along with a Multi-Resolution STFT loss to generate high-quality, natural-sounding speech waveforms.
This repository provides inference code only, allowing you to quickly generate audio from mel-spectrograms using a pre-trained generator.
tinyVGAN/
│
├── data/
│ └── ljspeech_100/ # Example dataset subset
│ └── metadata.csv
│
├── models/ # Pretrained models
│ └── generator.pth # Pretrained generator checkpoint
│
├── generator.py # Generator architecture definition
├── workflow.ipynb # Inference notebook (entry point)
├── requirements.txt # Dependencies
└── .gitignore
Clone the repository and install dependencies:
git clone https://github.com/moadabdou/tinyVGAN.git
cd tinyVGAN
pip install -r requirements.txtThe easiest way to run inference is through the provided Jupyter Notebook:
jupyter notebook workflow.ipynbInside the notebook, you can:
- Load the pretrained generator (
generator.pth) - Input mel-spectrograms
- Generate speech waveforms
- Save or play the output audio
-
Generator: Based on HiFi-GAN with architectural modifications inspired by BigVGANv2.
-
Losses:
- Multi-Resolution STFT Loss
- Multi-Period Discriminator Loss
- features matching Loss
The model is trained on the LJSpeech-1.1 dataset, a single-speaker English dataset commonly used for text-to-speech research.
This project is inspired by:
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
- BigVGAN v2: Scaling up GAN Vocoders
MIT License. See LICENSE for details.