Kyutai TTS Server

Kyutai TTS Server is an OpenAI-compatible Text-to-Speech (TTS) API server built with FastAPI. It provides a simple interface for generating speech from text using state-of-the-art TTS models.

Features

OpenAI-compatible API for text-to-speech generation
Supports multiple audio formats (WAV, MP3, FLAC, OGG)
Multi-voice and Multi Dialogue generation support
Health check endpoint
Docker support for easy deployment

Voice Library

The Kyutai (voices)[https://huggingface.co/kyutai/tts-voices] will be automatically downloaded, along with the (model)[https://huggingface.co/kyutai/tts-1.6b-en_fr] itself, from huggingface during startup to the cache directory.

For the API, you can provide the wav file path from this voice library. (for example expresso/ex03-ex01_happy_001_channel1_334s.wav).

Project Structure

Kyutai-TTS-Server/
├── app/
│   ├── __init__.py
│   ├── server.py
│   ├── config.py
│   ├── models.py
│   ├── tts.py
│   └── utils.py
├── .gitignore
├── docker-compose.yaml
├── Dockerfile
├── requirements.txt
└── README.md

API Endpoints

Text-to-Speech Generation

URL: /v1/audio/speech
Method: POST

Request Body:

{
  "model": "optional-model-name",
  "input": "Text to convert to speech",
  "voice": "expresso/ex03-ex01_happy_001_channel1_334s.wav",
  "response_format": "wav|mp3|flac|ogg",
  "speed": 1.0
}

Response: Audio file in the requested format

Multi-Voice, Multi-Dialogue Text-to-Speech Generation (extended from OpenAI API)

URL: /v1/audio/speech
Method: POST

Request Body:

{
  "model": "optional-model-name",
  "inputs": [
    "Hey there, I'm speaker A",
    "And I'm speaker B!",
    "We are having a dialogue"
  ],
  "voices": [
    "expresso/ex03-ex01_happy_001_channel1_334s.wav",
    "expresso/ex04-ex02_sarcastic_001_channel2_466s.wav"
  ]
  "response_format": "wav|mp3|flac|ogg",
  "speed": 1.0
}

Response: Audio file in the requested format

Health Check

URL: /health
Method: GET
Response:
```
{
  "status": "healthy"
}
```

Running the Server

Using Docker

Build the Docker image:
```
docker build -t kyutai-tts-server .
```

Run the Docker container:

docker run -d -p 8000:8000 kyutai-tts-server

Locally

Install the required dependencies:
```
pip install -r requirements.txt
```

Run the server:

python server.py --host 0.0.0.0 --port 8000 --reload

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kyutai TTS Server

Features

Voice Library

Project Structure

API Endpoints

Text-to-Speech Generation

Multi-Voice, Multi-Dialogue Text-to-Speech Generation (extended from OpenAI API)

Health Check

Running the Server

Using Docker

Locally

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

NillPointer/Kyutai-TTS-Server

Folders and files

Latest commit

History

Repository files navigation

Kyutai TTS Server

Features

Voice Library

Project Structure

API Endpoints

Text-to-Speech Generation

Multi-Voice, Multi-Dialogue Text-to-Speech Generation (extended from OpenAI API)

Health Check

Running the Server

Using Docker

Locally

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages