Kyutai TTS Server is an OpenAI-compatible Text-to-Speech (TTS) API server built with FastAPI. It provides a simple interface for generating speech from text using state-of-the-art TTS models.
- OpenAI-compatible API for text-to-speech generation
- Supports multiple audio formats (WAV, MP3, FLAC, OGG)
- Multi-voice and Multi Dialogue generation support
- Health check endpoint
- Docker support for easy deployment
The Kyutai (voices)[https://huggingface.co/kyutai/tts-voices] will be automatically downloaded, along with the (model)[https://huggingface.co/kyutai/tts-1.6b-en_fr] itself, from huggingface during startup to the cache directory.
For the API, you can provide the wav file path from this voice library. (for example expresso/ex03-ex01_happy_001_channel1_334s.wav
).
Kyutai-TTS-Server/
├── app/
│ ├── __init__.py
│ ├── server.py
│ ├── config.py
│ ├── models.py
│ ├── tts.py
│ └── utils.py
├── .gitignore
├── docker-compose.yaml
├── Dockerfile
├── requirements.txt
└── README.md
- URL:
/v1/audio/speech
- Method:
POST
- Request Body:
{ "model": "optional-model-name", "input": "Text to convert to speech", "voice": "expresso/ex03-ex01_happy_001_channel1_334s.wav", "response_format": "wav|mp3|flac|ogg", "speed": 1.0 }
- Response: Audio file in the requested format
- URL:
/v1/audio/speech
- Method:
POST
- Request Body:
{ "model": "optional-model-name", "inputs": [ "Hey there, I'm speaker A", "And I'm speaker B!", "We are having a dialogue" ], "voices": [ "expresso/ex03-ex01_happy_001_channel1_334s.wav", "expresso/ex04-ex02_sarcastic_001_channel2_466s.wav" ] "response_format": "wav|mp3|flac|ogg", "speed": 1.0 }
- Response: Audio file in the requested format
- URL:
/health
- Method:
GET
- Response:
{ "status": "healthy" }
-
Build the Docker image:
docker build -t kyutai-tts-server .
-
Run the Docker container:
docker run -d -p 8000:8000 kyutai-tts-server
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the server:
python server.py --host 0.0.0.0 --port 8000 --reload
This project is licensed under the MIT License.