SparkVox is a training framework focused on speech generation, while also supporting a range of related speech tasks, including speaker attribute recognition, emotion recognition, audio codecs, and speech synthesis.
- Speaker Attribute Recognition
- Age prediction
- Gender prediction
- Codec
- BiCodec
- BigCodec
- Speech Synthesis
- SparkTTS
bins
:train_pl
: The main training entry point for all tasks.
egs
:task
(e.g. codec, speech_synthesis): Example training scripts for each task.
sparkvox
models
: Model implementations for different tasks.
tools
: Utilities for data processing, model inference, and feature extraction.utils
: Common utilities for tasks such as reading and processing audio files, as well as general training tools.