Kokoro TTS API Server & Frontend

Brought to You By

This project is proudly developed and maintained by the team at ElevateSells Labs.

Kokoro TTS API Server & Frontend

This project provides a high-performance Text-to-Speech (TTS) API server built with Python FastAPI, utilizing the Kokoro-TTS library. It includes a user-friendly web interface built with React and TypeScript for easy interaction.

The server can dynamically load different language models, supports multiple voices per language, allows speed adjustment, and can output audio in both WAV and MP3 formats.

Core Technologies:

Backend: Python, FastAPI, Uvicorn, Kokoro-TTS, Pydub, SoundFile, Numpy
Frontend: React, TypeScript, Vite, Axios, CSS
Audio Conversion: FFmpeg (required for MP3 output)

Features

FastAPI Backend: Asynchronous and optimized for performance.
Kokoro-TTS Integration: Leverages the Kokoro-TTS library for speech synthesis.
Multi-Language Support: Dynamically load and switch between supported language models:
- American English (a)
- British English (b)
- French (f)
- Hindi (h)
- Italian (i)
- Japanese (j)
- Mandarin Chinese (z)
Multiple Voices: Supports various pre-defined voices for each language.
Adjustable Speed: Control the playback speed of the generated speech.
WAV & MP3 Output: Generate audio in WAV format or request MP3 via the Accept: audio/mpeg header (requires FFmpeg).
Admin Endpoints: Check server status, load/unload language models remotely.
React Frontend: Intuitive web interface to select language, voice, speed, enter text, and play generated audio.
CORS Enabled: Properly configured for frontend interaction from different origins.

Project Structure

. ├── backend/ # FastAPI application │ ├── app/ # Main application code │ │ ├── core/ # Core logic (model loading, etc.) │ │ ├── models.py # Pydantic models │ │ ├── routers/ # API endpoint routers (tts, admin) │ │ └── main.py # FastAPI app entrypoint & config │ ├── requirements.txt # Python dependencies │ └── run.py # Script to run the backend server ├── frontend/ # React application │ ├── public/ │ ├── src/ # Frontend source code │ │ ├── voices.ts # Voice definitions │ │ ├── App.css # Styling │ │ └── App.tsx # Main React component │ ├── index.html │ ├── package.json # Node.js dependencies │ ├── tsconfig.json # TypeScript config │ └── vite.config.ts # Vite config └── README.md # This file

Prerequisites

Before you begin, ensure you have the following installed:

Python: Version 3.8 or higher.
Node.js: Version 16 or higher (includes npm or use yarn).
FFmpeg: Required for MP3 audio conversion by the backend.
- Linux (Debian/Ubuntu): sudo apt update && sudo apt install ffmpeg
- macOS (Homebrew): brew install ffmpeg
- Windows: Download from the official FFmpeg website, extract, and add the bin directory to your system's PATH environment variable. Verify installation by running ffmpeg -version in your terminal.
Git: For cloning the repository.

Installation

Clone the Repository:

git clone <your-repository-url>
cd <your-repository-directory>

Backend Setup (FastAPI):

cd backend

# Create a virtual environment (recommended)
python -m venv venv

# Activate the virtual environment
# On Windows:
# venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

# Deactivate when done (optional): deactivate
cd ..

Frontend Setup (React):

cd frontend

# Install Node.js dependencies
npm install
# or: yarn install

# Create environment file for API URL configuration
# Copy the example or create .env directly
cp .env.example .env # If you create an example file

# Edit .env and set the backend API URL (default is usually correct for local dev)
# VITE_API_URL=http://localhost:8000/api/v1

cd ..

Running the Application

You need to run both the backend and frontend servers.

Start the Backend Server:

cd backend

# Activate virtual environment (if not already active)
# source venv/bin/activate  OR  venv\Scripts\activate

# Run using Uvicorn (with auto-reload for development)
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Or use the helper script: python run.py

The backend API will be available at http://localhost:8000. You can access the interactive API documentation at http://localhost:8000/docs.

Start the Frontend Server: Open another terminal window.
```
cd frontend

# Run the development server (usually opens the browser automatically)
npm run dev
# or: yarn dev
```
The frontend application will typically be available at http://localhost:5173 (Vite default) or http://localhost:3000 (CRA default). Check the terminal output for the exact URL.

Usage

1. Using the Web Frontend

Open the frontend URL (e.g., http://localhost:5173) in your browser.
The application will automatically check the backend server's status and currently loaded language.
Select Language: Choose the desired language from the dropdown. The app will request the backend to load the corresponding model (this may take a few seconds, indicated by the status message).
Select Voice: Once the language model is loaded, choose a voice from the filtered list for that language.
Enter Text: Type or paste the text you want to convert into the text area. Ensure the text matches the selected language.
Adjust Speed: Use the slider to set the desired speech speed (0.5x to 2.0x).
Generate Speech: Click the "Generate Speech" button.
Play Audio: If successful, an audio player will appear, allowing you to listen to the generated speech. The audio format will be MP3 if requested and FFmpeg is available on the server, otherwise WAV.

2. Using the API Directly

You can interact with the API endpoints using tools like curl, Postman, or any HTTP client. The primary endpoints are:

POST /api/v1/tts: Generate speech.

Body (JSON):

{
  "text": "Your text here, matching the loaded language.",
  "voice": "voice_id_string", // e.g., "bf_emma", "am_adam", "jf_alpha"
  "speed": 1.2, // Optional, default usually 1.0-1.3
  "split_pattern": "\\n+" // Optional regex for splitting text
}

Headers:
- Content-Type: application/json
- Accept: audio/mpeg (To request MP3 output)
- Accept: audio/wav (To request WAV output, also the default if Accept is omitted)
Response: Raw audio data (audio/mpeg or audio/wav).

GET /api/v1/admin/status: Get the current server status.

Response (JSON):

{
  "status": "loaded", // "loaded" | "unloaded" | "unknown"
  "lang_code": "b"    // "a", "b", "f", "h", "i", "j", "z", or null
}

POST /api/v1/admin/load_model?lang_code={code}: Request the server to load a specific language model.
- Replace {code} with the desired language code (e.g., a, b, f, j).
- Response (JSON): The new status after the load request is initiated (check /admin/status subsequently to confirm completion).
POST /api/v1/admin/unload_model: Unload the currently loaded model to free resources.
- Response (JSON): Indicates success or if no model was loaded.

Refer to http://localhost:8000/docs for detailed interactive API documentation.

3. Programmatic Usage (Node.js Example)

Here's a Node.js function using axios to call the API and save the audio to a file:

// ttsClient.ts
import axios, { AxiosError } from 'axios';
import * as fs from 'fs';
import * as stream from 'stream';
import { promisify } from 'util';
import * as path from 'path';

// Promisify for async/await with streams
const pipeline = promisify(stream.pipeline);

interface GenerateAudioParams {
    /** Base URL of the API server, e.g., 'http://localhost:8000/api/v1' */
    apiUrl: string;
    /** Full path for the output file, e.g., './output/speech.mp3' */
    outputFilePath: string;
    /** Text to synthesize */
    text: string;
    /** Voice ID, e.g., 'bm_george', 'jf_alpha' */
    voice: string;
    /** Speech speed (optional, defaults usually 1.0-1.3) */
    speed?: number;
    /** Desired output format */
    format: 'wav' | 'mp3';
}

async function generateAudioFromApi(params: GenerateAudioParams): Promise<void> {
    const { apiUrl, outputFilePath, text, voice, speed, format } = params;
    const ttsEndpoint = `${apiUrl.replace(/\/$/, '')}/tts`;

    console.log(`Requesting ${format.toUpperCase()} from ${ttsEndpoint} for voice: ${voice}`);
    console.log(`Output will be saved to: ${outputFilePath}`);

    const requestBody = { text, voice, speed }; // Speed is optional in body
    const acceptHeader = format === 'mp3' ? 'audio/mpeg' : 'audio/wav';

    try {
        const response = await axios.post<stream.Readable>(ttsEndpoint, requestBody, {
            responseType: 'stream',
            headers: {
                'Content-Type': 'application/json',
                'Accept': acceptHeader,
            },
            timeout: 60000, // Example timeout: 60 seconds
        });

        // Ensure output directory exists
        const outputDir = path.dirname(outputFilePath);
        if (!fs.existsSync(outputDir)) {
            console.log(`Creating output directory: ${outputDir}`);
            fs.mkdirSync(outputDir, { recursive: true });
        }

        // Pipe the stream to a file
        const writer = fs.createWriteStream(outputFilePath);
        await pipeline(response.data, writer);

        console.log(`Audio successfully saved as ${format.toUpperCase()} to ${outputFilePath}`);

    } catch (error: unknown) {
        console.error(`Error generating or saving audio:`);
        if (axios.isAxiosError(error)) {
            const axiosError = error as AxiosError;
            if (axiosError.response) {
                console.error(`API Error Status: ${axiosError.response.status}`);
                // Attempt to read error detail if response data is available (might be stream or text)
                let errorDetail = 'Could not read error detail.';
                 try {
                     // If data is a readable stream, try reading it
                     if (axiosError.response.data instanceof stream.Readable) {
                         const chunks = [];
                         for await (const chunk of axiosError.response.data) { chunks.push(chunk); }
                         errorDetail = Buffer.concat(chunks).toString();
                         // Try parsing if it looks like JSON
                         if (errorDetail.startsWith('{')) { errorDetail = JSON.parse(errorDetail).detail || errorDetail; }
                     } else if (typeof axiosError.response.data === 'string') {
                         errorDetail = axiosError.response.data;
                     } else if (typeof axiosError.response.data === 'object') {
                         errorDetail = JSON.stringify(axiosError.response.data);
                     }
                 } catch (readError) { console.error("Failed to read/parse error response stream:", readError); }
                 console.error('API Error Detail:', errorDetail);
            } else if (axiosError.request) {
                console.error('No response received from API:', axiosError.message);
            } else {
                console.error('Axios request setup error:', axiosError.message);
            }
        } else {
            console.error('Non-API error:', error);
        }
        throw new Error(`Failed to generate audio (see logs for details).`); // Re-throw simplified error
    }
}

// --- Example Usage ---
async function runExample() {
    try {
        await generateAudioFromApi({
            apiUrl: 'http://localhost:8000/api/v1', // Adjust if needed
            outputFilePath: './output/japanese_example.mp3',
            text: 'こんにちは世界！', // Requires Japanese model loaded ('j')
            voice: 'jf_alpha',       // Ensure this voice exists for Japanese
            speed: 1.1,
            format: 'mp3',           // Request MP3
        });

         await generateAudioFromApi({
            apiUrl: 'http://localhost:8000/api/v1',
            outputFilePath: './output/french_example.wav',
            text: 'Bonjour le monde!', // Requires French model loaded ('f')
            voice: 'ff_siwis',        // Ensure this voice exists for French
            format: 'wav',            // Request WAV
        });

    } catch (err: any) {
        console.error("Example execution failed:", err.message);
        // Ensure the correct language model is loaded on the server first!
        console.error("Did you load the correct language model ('j' then 'f') on the server using the frontend or API before running this?");
    }
}

// To run this example:
// 1. Ensure Node.js and axios are installed (`npm install axios` or `yarn add axios`).
// 2. Save this code as `ttsClient.ts` (or `.js`).
// 3. Compile if using TypeScript (`npx tsc ttsClient.ts`).
// 4. Make sure the FastAPI server is running.
// 5. IMPORTANT: Use the frontend or API to load the correct language model ('j' for the first call, 'f' for the second) on the server *before* running this script.
// 6. Run the script: `node ttsClient.js`.

// Uncomment to run directly if file is executed:
// runExample();

// Export if using as a module:
// export { generateAudioFromApi, GenerateAudioParams };

Configuration

Frontend

The backend API URL is configured in the frontend/.env file using the VITE_API_URL variable.

Backend

The default language loaded on startup can be changed in backend/app/main.py within the lifespan function.
CORS origins allowed are configured in backend/app/main.py. Add your frontend deployment URL to the origins list for production.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs, feature requests, or improvements.

Fork the repository.

Create a new branch:

git checkout -b feature/your-feature-name

Make your changes.
Commit your changes:

git commit -am 'Add some feature'

Push to the branch:

git commit -am 'Add some feature'

Create a new Pull Request.

Acknowledgements

Kokoro-TTS for the core text-to-speech library.
FastAPI framework.
React library.
Pydub for audio conversion.
FFmpeg for audio processing.
The various dataset creators credited within the Kokoro-TTS voice lists.

Brought to you by ElevateSells Labs.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Brought to You By

Kokoro TTS API Server & Frontend

Features

Project Structure

Prerequisites

Installation

Running the Application

Usage

1. Using the Web Frontend

2. Using the API Directly

3. Programmatic Usage (Node.js Example)

Configuration

Frontend

Backend

Contributing

Acknowledgements

License

About

Uh oh!

Releases

Packages

Languages

DoubleDipCode/kokoro-tts

Folders and files

Latest commit

History

Repository files navigation

Brought to You By

Kokoro TTS API Server & Frontend

Features

Project Structure

Prerequisites

Installation

Running the Application

Usage

1. Using the Web Frontend

2. Using the API Directly

3. Programmatic Usage (Node.js Example)

Configuration

Frontend

Backend

Contributing

Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages