Spark-TTS is an optional LLM-based TTS model (0.5B parameters) that provides zero-shot voice cloning using your character .wav files.
- β Zero-shot voice cloning - uses character voice files
- β Fully local - no API keys needed
- β GPU accelerated - fast inference with CUDA
- β Cross-platform - Windows, Linux, macOS
- Python: 3.11+ (recommended for CUDA support)
- GPU: CUDA-capable GPU (optional, will use CPU)
- Disk: ~5GB for model files
- RAM: 8GB+ (16GB+ recommended)
# GPU (CUDA) version
python setup_sparktts.py
# CPU-only version
python setup_sparktts.py --cpu-only-
Install dependencies:
# For GPU (CUDA 12.4) pip install -r requirements_cuda_sparktts.txt # For CPU only pip install -r requirements_cpu_sparktts.txt
-
Download model (~4GB):
python download_sparktts_model.py
-
Configure
.env:TTS_PROVIDER=sparktts SPARKTTS_MODEL_DIR=pretrained_models/Spark-TTS-0.5B SPARKTTS_MAX_CHARS=1000
-
Run the app:
uvicorn app.main:app --host 0.0.0.0 --port 8000
- Python 3.11+: Recommended - full PyTorch 2.6+ CUDA support
| Hardware | Speed | Quality |
|---|---|---|
| CPU | Slow (~30-60s) | Good |
| GPU (CUDA) | Fast (~2-5s) | Good |
- Ensure
TTS_PROVIDER=sparkttsin.env - Restart the application
# Verify CUDA installation
python -c "import torch; print(torch.cuda.is_available())"
# Reinstall CUDA PyTorch
pip install torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu124pip install "numpy>=1.21.6,<1.28.0"- Check internet connection
- Ensure ~5GB free disk space
- Try manual download from Hugging Face
Spark-TTS is optional. The app also supports:
- OpenAI TTS - Cloud-based, fast, high quality
- ElevenLabs - Cloud-based, excellent quality
- Kokoro TTS - Local, CPU-friendly, very fast
voice-chat-ai/
βββ sparktts/ # Spark-TTS core modules (integrated)
βββ cli/ # SparkTTS class and inference
βββ download_sparktts_model.py # Model download script
βββ setup_sparktts.py # Automated setup script
βββ requirements_cpu_sparktts.txt # CPU dependencies
βββ requirements_cuda_sparktts.txt # CUDA dependencies
βββ pretrained_models/ # Model storage (gitignored)
To remove Spark-TTS:
- Delete
pretrained_models/folder - Delete
sparktts/andcli/folders - Set
TTS_PROVIDERto another option in.env - Remove Spark-TTS packages:
pip uninstall einx einops omegaconf hydra-core soxr
Note: This is an experimental feature. For production use, consider OpenAI or ElevenLabs TTS.