This project provides a FastAPI interface for Indic Parler-TTS, a multilingual text-to-speech model that supports 21 Indian languages and English. The API allows you to generate natural-sounding speech from text in various Indian languages.
- Supports 21 languages: Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu
- 69 unique voices across languages
- Full control over voice characteristics (pitch, speed, tone, etc.)
- Support for emotion-specific prompts in 10 languages
- 44.1 kHz output sampling rate (native to the model)
- Install the required dependencies:
pip install -r requirements.txt- Start the API server:
python run_server.pyThe API will be available at http://localhost:8000.
- Endpoint:
POST /tts - Description: Generate speech from text using the Indic Parler-TTS model
Request Body:
prompt(string, required): The text to convert to speechdescription(string, optional): A detailed description of how the speech should sound (default: "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch...")
Example Request:
curl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Hello, how are you doing today?",
"description": "A female speaker with a British accent delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
}' -o english_output.wavHindi Example:
curl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"prompt": "नमस्ते, आप कैसे हैं?",
"description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
}' -o hindi_output.wavSpecific Speaker Example:
curl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"prompt": "नमस्कार, आप कैसी हैं?",
"description": "Divya'\''s voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
}' -o hindi_specific_speaker.wavTamil Example:
curl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"prompt": "ஹலோ, நீங்கள் இன்று எப்படி இருக்கிறீர்கள்?",
"description": "A female speaker with a soft and gentle tone speaks in a moderate pace. The recording is very clear with no background noise."
}' -o tamil_output.wavResponse:
- Direct WAV audio file (Content-Type: audio/wav)
- Endpoint:
GET /languages - Description: Get information about all supported languages and their recommended voices
- Endpoint:
GET / - Description: Check if the API is running correctly
- Start the API server (see above)
- Use the test script to verify functionality:
python test_api.pyThis will create several audio files in the working directory:
english_output.wav- English speech samplehindi_output.wav- Hindi speech sampletamil_output.wav- Tamil speech samplehindi_specific_speaker.wav- Hindi with specific speaker
- Available Speakers: Amit, Sita, Poonam, Rakesh
- Recommended: Amit, Sita
- Available Speakers: Arjun, Aditi, Tapan, Rashmi, Arnav, Riya
- Recommended: Arjun, Aditi
- Available Speakers: Bikram, Maya, Kalpana
- Recommended: Bikram, Maya
- Available Speakers: Karan
- Recommended: Karan
- Available Speakers: Thoma, Mary, Swapna, Dinesh, Meera, Jatin, Aakash, Sneha, Kabir, Tisha, Chingkhei, Thoiba, Priya, Tarun, Gauri, Nisha, Raghav, Kavya, Ravi, Vikas, Riya
- Recommended: Thoma, Mary
- Available Speakers: Yash, Neha
- Recommended: Yash, Neha
- Available Speakers: Rohit, Divya, Aman, Rani
- Recommended: Rohit, Divya
- Available Speakers: Suresh, Anu, Chetan, Vidya
- Recommended: Suresh, Anu
- Available Speakers: Anjali, Anju, Harish
- Recommended: Anjali, Harish
- Available Speakers: Laishram, Ranjit
- Recommended: Laishram, Ranjit
- Available Speakers: Sanjay, Sunita, Nikhil, Radha, Varun, Isha
- Recommended: Sanjay, Sunita
- Available Speakers: Amrita
- Recommended: Amrita
- Available Speakers: Manas, Debjani
- Recommended: Manas, Debjani
- Available Speakers: Aryan
- Recommended: Aryan
- Available Speakers: Kavitha, Jaya
- Recommended: Jaya
- Available Speakers: Prakash, Lalitha, Kiran
- Recommended: Prakash, Lalitha
To ensure speaker consistency across generations, Indic Parler-TTS has been trained on predetermined speakers for each language. To use a specific speaker, adapt your description to reference the speaker by name.
Simply include the speaker's name in your description field:
- Example:
"Divya's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
- Use speaker names for consistent voice characteristics
- You can combine speaker names with other voice features (pitch, speed, tone, etc.)
- For best results, use the recommended speakers for each language
Indic Parler-TTS offers precise control over various speech characteristics using the description field:
- Use "very clear audio" for highest quality
- Use "very noisy audio" for high background noise levels
- Controls the perceived distance of the voice (close to distant sounding)
- From monotone to highly expressive
- Use terms like "slightly expressive", "animated", or "monotone"
- Specify as "high-pitched", "low-pitched", or "moderate pitch"
- From "slow" to "fast-paced"
- From "basic" to "refined" voice quality
- Specify accents like "British accent", "American accent", etc.
- Example: "A male British speaker"
The following languages officially support emotion-specific prompts:
- Assamese
- Bengali
- Bodo
- Dogri
- Kannada
- Malayalam
- Marathi
- Sanskrit
- Nepali
- Tamil
Available emotions include: Command, Anger, Narration, Conversation, Disgust, Fear, Happy, Neutral, Proper Noun, News, Sad, and Surprise.
Here are examples of how to use descriptions for specific speakers:
"Aditi speaks with a slightly higher pitch in a close-sounding environment. Her voice is clear, with subtle emotional depth and a normal pace, all captured in high-quality recording."
"Sita speaks at a fast pace with a slightly low-pitched voice, captured clearly in a close-sounding environment with excellent recording quality."
"Tapan speaks at a moderate pace with a slightly monotone tone. The recording is clear, with a close sound and only minimal ambient noise."
"Sunita speaks with a high pitch in a close environment. Her voice is clear, with slight dynamic changes, and the recording is of excellent quality."
"Karan's high-pitched, engaging voice is captured in a clear, close-sounding recording. His slightly slower delivery conveys a positive tone."
"Amrita speaks with a high pitch at a slow pace. Her voice is clear, with excellent recording quality and only moderate background noise."
"A young male speaker with a high-pitched American accent delivers speech at a slightly fast pace in a clear, close-sounding recording."
"Bikram speaks with a higher pitch and fast pace, conveying urgency. The recording is clear and intimate, with great emotional depth."
"Anjali speaks with a high pitch at a normal pace in a clear, close-sounding environment. Her neutral tone is captured with excellent audio quality."
- Model: Indic Parler-TTS (fine-tuned from Indic Parler-TTS Pretrained)
- Architecture: Based on Parler-TTS with enhancements for multilingual support
- Training Data: 1,806 hours of multilingual Indic and English dataset
- Languages: 21 officially supported languages
- Voices: 69 unique voices across languages
- Output Sampling Rate: 44.1 kHz (native to the model)