Indic Parler-TTS API Documentation

Overview

Indic Parler-TTS is a multilingual Indic extension of Parler-TTS Mini, supporting 21 languages including various Indian regional languages and English. This API provides a simple interface to generate natural sounding speech in any of these languages.

Installation

To run the API, you'll need to install the following dependencies:

pip install fastapi uvicorn transformers parler-tts soundfile numpy pydantic torch

Running the API

python run_server.py

This will start the API server on port 8000.

API Endpoints

1. Generate Speech

Endpoint: POST /tts
Description: Generate speech from text using the Indic Parler-TTS model
Request Body:
- prompt (string, required): The text to convert to speech
- description (string, optional): A detailed description of how the speech should sound (default: "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch...")
- language (string, optional): Language code (default: "auto", will auto-detect based on prompt)
Response:
- audio_base64 (string): Base64 encoded audio data in WAV format
- sampling_rate (int): The sampling rate of the generated audio (44100 Hz - native to the model)

2. Get Supported Languages

Endpoint: GET /languages
Description: Get information about all supported languages and their recommended voices
Response: JSON object containing languages with their available and recommended speakers

3. Health Check

Endpoint: GET /
Description: Check if the API is running correctly
Response: Simple status message

Supported Languages and Voices

Indic Parler-TTS officially supports 21 languages with a total of 69 unique voices across these languages:

Officially Supported Languages

Assamese
Bengali
Bodo
Dogri
English
Gujarati
Hindi
Kannada
Konkani
Maithili
Malayalam
Manipuri
Marathi
Nepali
Odia
Sanskrit
Santali
Sindhi
Tamil
Telugu
Urdu

Language-Specific Voices

Assamese

Available Speakers: Amit, Sita, Poonam, Rakesh
Recommended: Amit, Sita

Bengali

Available Speakers: Arjun, Aditi, Tapan, Rashmi, Arnav, Riya
Recommended: Arjun, Aditi

Bodo

Available Speakers: Bikram, Maya, Kalpana
Recommended: Bikram, Maya

Dogri

Available Speakers: Karan
Recommended: Karan

English

Available Speakers: Thoma, Mary, Swapna, Dinesh, Meera, Jatin, Aakash, Sneha, Kabir, Tisha, Chingkhei, Thoiba, Priya, Tarun, Gauri, Nisha, Raghav, Kavya, Ravi, Vikas, Riya
Recommended: Thoma, Mary

Gujarati

Available Speakers: Yash, Neha
Recommended: Yash, Neha

Hindi

Available Speakers: Rohit, Divya, Aman, Rani
Recommended: Rohit, Divya

Kannada

Available Speakers: Suresh, Anu, Chetan, Vidya
Recommended: Suresh, Anu

Malayalam

Available Speakers: Anjali, Anju, Harish
Recommended: Anjali, Harish

Manipuri

Available Speakers: Laishram, Ranjit
Recommended: Laishram, Ranjit

Marathi

Available Speakers: Sanjay, Sunita, Nikhil, Radha, Varun, Isha
Recommended: Sanjay, Sunita

Nepali

Available Speakers: Amrita
Recommended: Amrita

Odia

Available Speakers: Manas, Debjani
Recommended: Manas, Debjani

Sanskrit

Available Speakers: Aryan
Recommended: Aryan

Tamil

Available Speakers: Kavitha, Jaya
Recommended: Jaya

Telugu

Available Speakers: Prakash, Lalitha, Kiran
Recommended: Prakash, Lalitha

Emotion Support

The following languages officially support emotion-specific prompts:

Assamese
Bengali
Bodo
Dogri
Kannada
Malayalam
Marathi
Sanskrit
Nepali
Tamil

Available emotions include: Command, Anger, Narration, Conversation, Disgust, Fear, Happy, Neutral, Proper Noun, News, Sad, and Surprise.

Usage Examples

1. Generate English Speech (cURL)

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello, how are you doing today?",
    "description": "A female speaker with a British accent delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
  }' -o english_output.wav

2. Generate Hindi Speech (cURL)

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "नमस्ते, आप कैसे हैं?",
    "description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
  }' -o hindi_output.wav

3. Generate Speech with Specific Speaker (Hindi)

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "नमस्कार, आप कैसी हैं?",
    "description": "Divya'\''s voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
  }' -o hindi_specific_speaker.wav

4. Generate Tamil Speech (cURL)

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "ஹலோ, நீங்கள் இன்று எப்படி இருக்கிறீர்கள்?",
    "description": "A female speaker with a soft and gentle tone speaks in a moderate pace. The recording is very clear with no background noise."
  }' -o tamil_output.wav

4. Get Supported Languages (cURL)

curl -X GET "http://localhost:8000/languages"

Customizing Speech Output

Indic Parler-TTS offers precise control over various speech characteristics using the description field:

Background Noise

Use "very clear audio" for highest quality
Use "very noisy audio" for high background noise levels

Reverberation

Controls the perceived distance of the voice (close to distant sounding)

Expressivity

From monotone to highly expressive
Use terms like "slightly expressive", "animated", or "monotone"

Pitch

Specify as "high-pitched", "low-pitched", or "moderate pitch"

Speaking Rate

From "slow" to "fast-paced"

Voice Quality

From "basic" to "refined" voice quality

Accent Control

Specify accents like "British accent", "American accent", etc.
Example: "A male British speaker"

Tips

The language is automatically detected based on the prompt text
For better naturalness, use recommended voices for each language
You can use punctuation to control prosody (e.g., commas for small breaks)
For Indian English accents, use the English voices which support this natively

Model Information

Model: Indic Parler-TTS (fine-tuned from Indic Parler-TTS Pretrained)
Architecture: Based on Parler-TTS with enhancements for multilingual support
Training Data: 1,806 hours of multilingual Indic and English dataset
Languages: 21 officially supported languages
Voices: 69 unique voices across languages
Output Sampling Rate: 44.1 kHz (native to the model)

FilesExpand file tree

API_DOCUMENTATION.md

Latest commit

History