This project is proudly developed and maintained by the team at ElevateSells Labs.
This project provides a high-performance Text-to-Speech (TTS) API server built with Python FastAPI, utilizing the Kokoro-TTS library. It includes a user-friendly web interface built with React and TypeScript for easy interaction.
The server can dynamically load different language models, supports multiple voices per language, allows speed adjustment, and can output audio in both WAV and MP3 formats.
Core Technologies:
- Backend: Python, FastAPI, Uvicorn, Kokoro-TTS, Pydub, SoundFile, Numpy
- Frontend: React, TypeScript, Vite, Axios, CSS
- Audio Conversion: FFmpeg (required for MP3 output)
- FastAPI Backend: Asynchronous and optimized for performance.
- Kokoro-TTS Integration: Leverages the Kokoro-TTS library for speech synthesis.
- Multi-Language Support: Dynamically load and switch between supported language models:
- American English (
a
) - British English (
b
) - French (
f
) - Hindi (
h
) - Italian (
i
) - Japanese (
j
) - Mandarin Chinese (
z
)
- American English (
- Multiple Voices: Supports various pre-defined voices for each language.
- Adjustable Speed: Control the playback speed of the generated speech.
- WAV & MP3 Output: Generate audio in WAV format or request MP3 via the
Accept: audio/mpeg
header (requires FFmpeg). - Admin Endpoints: Check server status, load/unload language models remotely.
- React Frontend: Intuitive web interface to select language, voice, speed, enter text, and play generated audio.
- CORS Enabled: Properly configured for frontend interaction from different origins.
. ├── backend/ # FastAPI application │ ├── app/ # Main application code │ │ ├── core/ # Core logic (model loading, etc.) │ │ ├── models.py # Pydantic models │ │ ├── routers/ # API endpoint routers (tts, admin) │ │ └── main.py # FastAPI app entrypoint & config │ ├── requirements.txt # Python dependencies │ └── run.py # Script to run the backend server ├── frontend/ # React application │ ├── public/ │ ├── src/ # Frontend source code │ │ ├── voices.ts # Voice definitions │ │ ├── App.css # Styling │ │ └── App.tsx # Main React component │ ├── index.html │ ├── package.json # Node.js dependencies │ ├── tsconfig.json # TypeScript config │ └── vite.config.ts # Vite config └── README.md # This file
Before you begin, ensure you have the following installed:
- Python: Version 3.8 or higher.
- Node.js: Version 16 or higher (includes npm or use yarn).
- FFmpeg: Required for MP3 audio conversion by the backend.
- Linux (Debian/Ubuntu):
sudo apt update && sudo apt install ffmpeg
- macOS (Homebrew):
brew install ffmpeg
- Windows: Download from the official FFmpeg website, extract, and add the
bin
directory to your system's PATH environment variable. Verify installation by runningffmpeg -version
in your terminal.
- Linux (Debian/Ubuntu):
- Git: For cloning the repository.
-
Clone the Repository:
git clone <your-repository-url> cd <your-repository-directory>
-
Backend Setup (FastAPI):
cd backend # Create a virtual environment (recommended) python -m venv venv # Activate the virtual environment # On Windows: # venv\Scripts\activate # On macOS/Linux: source venv/bin/activate # Install Python dependencies pip install -r requirements.txt # Deactivate when done (optional): deactivate cd ..
-
Frontend Setup (React):
cd frontend # Install Node.js dependencies npm install # or: yarn install # Create environment file for API URL configuration # Copy the example or create .env directly cp .env.example .env # If you create an example file # Edit .env and set the backend API URL (default is usually correct for local dev) # VITE_API_URL=http://localhost:8000/api/v1 cd ..
You need to run both the backend and frontend servers.
-
Start the Backend Server:
cd backend # Activate virtual environment (if not already active) # source venv/bin/activate OR venv\Scripts\activate # Run using Uvicorn (with auto-reload for development) uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 # Or use the helper script: python run.py
The backend API will be available at
http://localhost:8000
. You can access the interactive API documentation athttp://localhost:8000/docs
. -
Start the Frontend Server: Open another terminal window.
cd frontend # Run the development server (usually opens the browser automatically) npm run dev # or: yarn dev
The frontend application will typically be available at
http://localhost:5173
(Vite default) orhttp://localhost:3000
(CRA default). Check the terminal output for the exact URL.
- Open the frontend URL (e.g.,
http://localhost:5173
) in your browser. - The application will automatically check the backend server's status and currently loaded language.
- Select Language: Choose the desired language from the dropdown. The app will request the backend to load the corresponding model (this may take a few seconds, indicated by the status message).
- Select Voice: Once the language model is loaded, choose a voice from the filtered list for that language.
- Enter Text: Type or paste the text you want to convert into the text area. Ensure the text matches the selected language.
- Adjust Speed: Use the slider to set the desired speech speed (0.5x to 2.0x).
- Generate Speech: Click the "Generate Speech" button.
- Play Audio: If successful, an audio player will appear, allowing you to listen to the generated speech. The audio format will be MP3 if requested and FFmpeg is available on the server, otherwise WAV.
You can interact with the API endpoints using tools like curl
, Postman, or any HTTP client. The primary endpoints are:
-
POST /api/v1/tts
: Generate speech.- Body (JSON):
{ "text": "Your text here, matching the loaded language.", "voice": "voice_id_string", // e.g., "bf_emma", "am_adam", "jf_alpha" "speed": 1.2, // Optional, default usually 1.0-1.3 "split_pattern": "\\n+" // Optional regex for splitting text }
- Headers:
Content-Type: application/json
Accept: audio/mpeg
(To request MP3 output)Accept: audio/wav
(To request WAV output, also the default if Accept is omitted)
- Response: Raw audio data (
audio/mpeg
oraudio/wav
).
- Body (JSON):
-
GET /api/v1/admin/status
: Get the current server status.- Response (JSON):
{ "status": "loaded", // "loaded" | "unloaded" | "unknown" "lang_code": "b" // "a", "b", "f", "h", "i", "j", "z", or null }
- Response (JSON):
-
POST /api/v1/admin/load_model?lang_code={code}
: Request the server to load a specific language model.- Replace
{code}
with the desired language code (e.g.,a
,b
,f
,j
). - Response (JSON): The new status after the load request is initiated (check
/admin/status
subsequently to confirm completion).
- Replace
-
POST /api/v1/admin/unload_model
: Unload the currently loaded model to free resources.- Response (JSON): Indicates success or if no model was loaded.
Refer to http://localhost:8000/docs
for detailed interactive API documentation.
Here's a Node.js function using axios
to call the API and save the audio to a file:
// ttsClient.ts
import axios, { AxiosError } from 'axios';
import * as fs from 'fs';
import * as stream from 'stream';
import { promisify } from 'util';
import * as path from 'path';
// Promisify for async/await with streams
const pipeline = promisify(stream.pipeline);
interface GenerateAudioParams {
/** Base URL of the API server, e.g., 'http://localhost:8000/api/v1' */
apiUrl: string;
/** Full path for the output file, e.g., './output/speech.mp3' */
outputFilePath: string;
/** Text to synthesize */
text: string;
/** Voice ID, e.g., 'bm_george', 'jf_alpha' */
voice: string;
/** Speech speed (optional, defaults usually 1.0-1.3) */
speed?: number;
/** Desired output format */
format: 'wav' | 'mp3';
}
async function generateAudioFromApi(params: GenerateAudioParams): Promise<void> {
const { apiUrl, outputFilePath, text, voice, speed, format } = params;
const ttsEndpoint = `${apiUrl.replace(/\/$/, '')}/tts`;
console.log(`Requesting ${format.toUpperCase()} from ${ttsEndpoint} for voice: ${voice}`);
console.log(`Output will be saved to: ${outputFilePath}`);
const requestBody = { text, voice, speed }; // Speed is optional in body
const acceptHeader = format === 'mp3' ? 'audio/mpeg' : 'audio/wav';
try {
const response = await axios.post<stream.Readable>(ttsEndpoint, requestBody, {
responseType: 'stream',
headers: {
'Content-Type': 'application/json',
'Accept': acceptHeader,
},
timeout: 60000, // Example timeout: 60 seconds
});
// Ensure output directory exists
const outputDir = path.dirname(outputFilePath);
if (!fs.existsSync(outputDir)) {
console.log(`Creating output directory: ${outputDir}`);
fs.mkdirSync(outputDir, { recursive: true });
}
// Pipe the stream to a file
const writer = fs.createWriteStream(outputFilePath);
await pipeline(response.data, writer);
console.log(`Audio successfully saved as ${format.toUpperCase()} to ${outputFilePath}`);
} catch (error: unknown) {
console.error(`Error generating or saving audio:`);
if (axios.isAxiosError(error)) {
const axiosError = error as AxiosError;
if (axiosError.response) {
console.error(`API Error Status: ${axiosError.response.status}`);
// Attempt to read error detail if response data is available (might be stream or text)
let errorDetail = 'Could not read error detail.';
try {
// If data is a readable stream, try reading it
if (axiosError.response.data instanceof stream.Readable) {
const chunks = [];
for await (const chunk of axiosError.response.data) { chunks.push(chunk); }
errorDetail = Buffer.concat(chunks).toString();
// Try parsing if it looks like JSON
if (errorDetail.startsWith('{')) { errorDetail = JSON.parse(errorDetail).detail || errorDetail; }
} else if (typeof axiosError.response.data === 'string') {
errorDetail = axiosError.response.data;
} else if (typeof axiosError.response.data === 'object') {
errorDetail = JSON.stringify(axiosError.response.data);
}
} catch (readError) { console.error("Failed to read/parse error response stream:", readError); }
console.error('API Error Detail:', errorDetail);
} else if (axiosError.request) {
console.error('No response received from API:', axiosError.message);
} else {
console.error('Axios request setup error:', axiosError.message);
}
} else {
console.error('Non-API error:', error);
}
throw new Error(`Failed to generate audio (see logs for details).`); // Re-throw simplified error
}
}
// --- Example Usage ---
async function runExample() {
try {
await generateAudioFromApi({
apiUrl: 'http://localhost:8000/api/v1', // Adjust if needed
outputFilePath: './output/japanese_example.mp3',
text: 'こんにちは世界!', // Requires Japanese model loaded ('j')
voice: 'jf_alpha', // Ensure this voice exists for Japanese
speed: 1.1,
format: 'mp3', // Request MP3
});
await generateAudioFromApi({
apiUrl: 'http://localhost:8000/api/v1',
outputFilePath: './output/french_example.wav',
text: 'Bonjour le monde!', // Requires French model loaded ('f')
voice: 'ff_siwis', // Ensure this voice exists for French
format: 'wav', // Request WAV
});
} catch (err: any) {
console.error("Example execution failed:", err.message);
// Ensure the correct language model is loaded on the server first!
console.error("Did you load the correct language model ('j' then 'f') on the server using the frontend or API before running this?");
}
}
// To run this example:
// 1. Ensure Node.js and axios are installed (`npm install axios` or `yarn add axios`).
// 2. Save this code as `ttsClient.ts` (or `.js`).
// 3. Compile if using TypeScript (`npx tsc ttsClient.ts`).
// 4. Make sure the FastAPI server is running.
// 5. IMPORTANT: Use the frontend or API to load the correct language model ('j' for the first call, 'f' for the second) on the server *before* running this script.
// 6. Run the script: `node ttsClient.js`.
// Uncomment to run directly if file is executed:
// runExample();
// Export if using as a module:
// export { generateAudioFromApi, GenerateAudioParams };
The backend API URL is configured in the frontend/.env
file using the VITE_API_URL
variable.
- The default language loaded on startup can be changed in
backend/app/main.py
within thelifespan
function. - CORS origins allowed are configured in
backend/app/main.py
. Add your frontend deployment URL to theorigins
list for production.
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs, feature requests, or improvements.
-
Fork the repository.
-
Create a new branch:
git checkout -b feature/your-feature-name
-
Make your changes.
-
Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git commit -am 'Add some feature'
- Create a new Pull Request.
- Kokoro-TTS for the core text-to-speech library.
- FastAPI framework.
- React library.
- Pydub for audio conversion.
- FFmpeg for audio processing.
- The various dataset creators credited within the Kokoro-TTS voice lists.
Brought to you by ElevateSells Labs.
This project is licensed under the MIT License.