Self-hosted multi-backend text-to-speech platform with voice cloning and a modern web UI.
Full documentation: kevinbonnoron.github.io/sirene
curl -sSL https://raw.githubusercontent.com/KevinBonnoron/sirene/main/install.sh | bashThen open http://localhost.
- Multi-backend TTS — Route requests to Kokoro, Qwen3-TTS, F5-TTS, Piper, CosyVoice, OpenAudio, or Chatterbox from a single interface
- Voice cloning — Create custom voices by uploading audio samples with zero-shot cloning
- Model management — Download and manage TTS models on demand from the web UI
- Real-time updates — Track downloads and generation progress via Server-Sent Events
- Transcription — Speech-to-text via Whisper models
- Self-hosted — Two lightweight Docker images: one for the web/API, one for inference
| Backend | Voice Cloning | Streaming | Languages |
|---|---|---|---|
| Kokoro | — | — | EN, FR, JA, KO, ZH |
| Qwen3-TTS | Yes | — | 10+ languages |
| F5-TTS | Yes | Yes | Multilingual |
| Piper | — | — | 26 languages |
| CosyVoice | Yes | Yes | 9 languages |
| OpenAudio S1 | Yes | — | Multilingual |
| Chatterbox | Yes | — | EN + 23 languages |
- Bun >= 1.2.4
- Python >= 3.11
- PocketBase (installed automatically in the devcontainer)
The easiest way is to use the devcontainer — open the project in VS Code or GitHub Codespaces and all dependencies are installed automatically.
For manual setup:
bun install
pip install -e "./inference[cpu]"
mkdir -p data/modelsbun run dev| Service | Port |
|---|---|
| PocketBase | 8090 |
| Hono Server | 3000 |
| Vite Client | 5173 |
| Inference FastAPI | 8000 |
bun run dev # All services in dev mode
bun run build # Production build
bun run lint # Biome lint
bun run format # Biome format
bun run type-check # TypeScript checkMIT




