Indic Parler-TTS is a multilingual Indic extension of Parler-TTS Mini, supporting 21 languages including various Indian regional languages and English. This API provides a simple interface to generate natural sounding speech in any of these languages.
To run the API, you'll need to install the following dependencies:
pip install fastapi uvicorn transformers parler-tts soundfile numpy pydantic torchpython run_server.pyThis will start the API server on port 8000.
-
Endpoint:
POST /tts -
Description: Generate speech from text using the Indic Parler-TTS model
-
Request Body:
prompt(string, required): The text to convert to speechdescription(string, optional): A detailed description of how the speech should sound (default: "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch...")language(string, optional): Language code (default: "auto", will auto-detect based on prompt)
-
Response:
audio_base64(string): Base64 encoded audio data in WAV formatsampling_rate(int): The sampling rate of the generated audio (44100 Hz - native to the model)
- Endpoint:
GET /languages - Description: Get information about all supported languages and their recommended voices
- Response: JSON object containing languages with their available and recommended speakers
- Endpoint:
GET / - Description: Check if the API is running correctly
- Response: Simple status message
Indic Parler-TTS officially supports 21 languages with a total of 69 unique voices across these languages:
- Assamese
- Bengali
- Bodo
- Dogri
- English
- Gujarati
- Hindi
- Kannada
- Konkani
- Maithili
- Malayalam
- Manipuri
- Marathi
- Nepali
- Odia
- Sanskrit
- Santali
- Sindhi
- Tamil
- Telugu
- Urdu
- Available Speakers: Amit, Sita, Poonam, Rakesh
- Recommended: Amit, Sita
- Available Speakers: Arjun, Aditi, Tapan, Rashmi, Arnav, Riya
- Recommended: Arjun, Aditi
- Available Speakers: Bikram, Maya, Kalpana
- Recommended: Bikram, Maya
- Available Speakers: Karan
- Recommended: Karan
- Available Speakers: Thoma, Mary, Swapna, Dinesh, Meera, Jatin, Aakash, Sneha, Kabir, Tisha, Chingkhei, Thoiba, Priya, Tarun, Gauri, Nisha, Raghav, Kavya, Ravi, Vikas, Riya
- Recommended: Thoma, Mary
- Available Speakers: Yash, Neha
- Recommended: Yash, Neha
- Available Speakers: Rohit, Divya, Aman, Rani
- Recommended: Rohit, Divya
- Available Speakers: Suresh, Anu, Chetan, Vidya
- Recommended: Suresh, Anu
- Available Speakers: Anjali, Anju, Harish
- Recommended: Anjali, Harish
- Available Speakers: Laishram, Ranjit
- Recommended: Laishram, Ranjit
- Available Speakers: Sanjay, Sunita, Nikhil, Radha, Varun, Isha
- Recommended: Sanjay, Sunita
- Available Speakers: Amrita
- Recommended: Amrita
- Available Speakers: Manas, Debjani
- Recommended: Manas, Debjani
- Available Speakers: Aryan
- Recommended: Aryan
- Available Speakers: Kavitha, Jaya
- Recommended: Jaya
- Available Speakers: Prakash, Lalitha, Kiran
- Recommended: Prakash, Lalitha
The following languages officially support emotion-specific prompts:
- Assamese
- Bengali
- Bodo
- Dogri
- Kannada
- Malayalam
- Marathi
- Sanskrit
- Nepali
- Tamil
Available emotions include: Command, Anger, Narration, Conversation, Disgust, Fear, Happy, Neutral, Proper Noun, News, Sad, and Surprise.
curl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Hello, how are you doing today?",
"description": "A female speaker with a British accent delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
}' -o english_output.wavcurl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"prompt": "नमस्ते, आप कैसे हैं?",
"description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
}' -o hindi_output.wavcurl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"prompt": "नमस्कार, आप कैसी हैं?",
"description": "Divya'\''s voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
}' -o hindi_specific_speaker.wavcurl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"prompt": "ஹலோ, நீங்கள் இன்று எப்படி இருக்கிறீர்கள்?",
"description": "A female speaker with a soft and gentle tone speaks in a moderate pace. The recording is very clear with no background noise."
}' -o tamil_output.wavcurl -X GET "http://localhost:8000/languages"Indic Parler-TTS offers precise control over various speech characteristics using the description field:
- Use "very clear audio" for highest quality
- Use "very noisy audio" for high background noise levels
- Controls the perceived distance of the voice (close to distant sounding)
- From monotone to highly expressive
- Use terms like "slightly expressive", "animated", or "monotone"
- Specify as "high-pitched", "low-pitched", or "moderate pitch"
- From "slow" to "fast-paced"
- From "basic" to "refined" voice quality
- Specify accents like "British accent", "American accent", etc.
- Example: "A male British speaker"
- The language is automatically detected based on the prompt text
- For better naturalness, use recommended voices for each language
- You can use punctuation to control prosody (e.g., commas for small breaks)
- For Indian English accents, use the English voices which support this natively
- Model: Indic Parler-TTS (fine-tuned from Indic Parler-TTS Pretrained)
- Architecture: Based on Parler-TTS with enhancements for multilingual support
- Training Data: 1,806 hours of multilingual Indic and English dataset
- Languages: 21 officially supported languages
- Voices: 69 unique voices across languages
- Output Sampling Rate: 44.1 kHz (native to the model)