Indic Parler-TTS API

This project provides a FastAPI interface for Indic Parler-TTS, a multilingual text-to-speech model that supports 21 Indian languages and English. The API allows you to generate natural-sounding speech from text in various Indian languages.

Features

Supports 21 languages: Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu
69 unique voices across languages
Full control over voice characteristics (pitch, speed, tone, etc.)
Support for emotion-specific prompts in 10 languages
44.1 kHz output sampling rate (native to the model)

Installation

Install the required dependencies:

pip install -r requirements.txt

Running the API

Start the API server:

python run_server.py

The API will be available at http://localhost:8000.

API Usage

Generate Speech

Endpoint: POST /tts
Description: Generate speech from text using the Indic Parler-TTS model

Request Body:

prompt (string, required): The text to convert to speech
description (string, optional): A detailed description of how the speech should sound (default: "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch...")

Example Request:

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello, how are you doing today?",
    "description": "A female speaker with a British accent delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
  }' -o english_output.wav

Hindi Example:

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "नमस्ते, आप कैसे हैं?",
    "description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
  }' -o hindi_output.wav

Specific Speaker Example:

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "नमस्कार, आप कैसी हैं?",
    "description": "Divya'\''s voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
  }' -o hindi_specific_speaker.wav

Tamil Example:

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "ஹலோ, நீங்கள் இன்று எப்படி இருக்கிறீர்கள்?",
    "description": "A female speaker with a soft and gentle tone speaks in a moderate pace. The recording is very clear with no background noise."
  }' -o tamil_output.wav

Response:

Direct WAV audio file (Content-Type: audio/wav)

Get Supported Languages

Endpoint: GET /languages
Description: Get information about all supported languages and their recommended voices

Health Check

Endpoint: GET /
Description: Check if the API is running correctly

Testing the API

Start the API server (see above)
Use the test script to verify functionality:

python test_api.py

This will create several audio files in the working directory:

english_output.wav - English speech sample
hindi_output.wav - Hindi speech sample
tamil_output.wav - Tamil speech sample
hindi_specific_speaker.wav - Hindi with specific speaker

Language-Specific Voices

Assamese

Available Speakers: Amit, Sita, Poonam, Rakesh
Recommended: Amit, Sita

Bengali

Available Speakers: Arjun, Aditi, Tapan, Rashmi, Arnav, Riya
Recommended: Arjun, Aditi

Bodo

Available Speakers: Bikram, Maya, Kalpana
Recommended: Bikram, Maya

Dogri

Available Speakers: Karan
Recommended: Karan

English

Available Speakers: Thoma, Mary, Swapna, Dinesh, Meera, Jatin, Aakash, Sneha, Kabir, Tisha, Chingkhei, Thoiba, Priya, Tarun, Gauri, Nisha, Raghav, Kavya, Ravi, Vikas, Riya
Recommended: Thoma, Mary

Gujarati

Available Speakers: Yash, Neha
Recommended: Yash, Neha

Hindi

Available Speakers: Rohit, Divya, Aman, Rani
Recommended: Rohit, Divya

Kannada

Available Speakers: Suresh, Anu, Chetan, Vidya
Recommended: Suresh, Anu

Malayalam

Available Speakers: Anjali, Anju, Harish
Recommended: Anjali, Harish

Manipuri

Available Speakers: Laishram, Ranjit
Recommended: Laishram, Ranjit

Marathi

Available Speakers: Sanjay, Sunita, Nikhil, Radha, Varun, Isha
Recommended: Sanjay, Sunita

Nepali

Available Speakers: Amrita
Recommended: Amrita

Odia

Available Speakers: Manas, Debjani
Recommended: Manas, Debjani

Sanskrit

Available Speakers: Aryan
Recommended: Aryan

Tamil

Available Speakers: Kavitha, Jaya
Recommended: Jaya

Telugu

Available Speakers: Prakash, Lalitha, Kiran
Recommended: Prakash, Lalitha

Using Specific Speakers

To ensure speaker consistency across generations, Indic Parler-TTS has been trained on predetermined speakers for each language. To use a specific speaker, adapt your description to reference the speaker by name.

How to Use Specific Speakers

Simply include the speaker's name in your description field:

Example: "Divya's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."

Tips for Using Specific Speakers

Use speaker names for consistent voice characteristics
You can combine speaker names with other voice features (pitch, speed, tone, etc.)
For best results, use the recommended speakers for each language

Customizing Speech Output

Indic Parler-TTS offers precise control over various speech characteristics using the description field:

Background Noise

Use "very clear audio" for highest quality
Use "very noisy audio" for high background noise levels

Reverberation

Controls the perceived distance of the voice (close to distant sounding)

Expressivity

From monotone to highly expressive
Use terms like "slightly expressive", "animated", or "monotone"

Pitch

Specify as "high-pitched", "low-pitched", or "moderate pitch"

Speaking Rate

From "slow" to "fast-paced"

Voice Quality

From "basic" to "refined" voice quality

Accent Control

Specify accents like "British accent", "American accent", etc.
Example: "A male British speaker"

Emotion Support

The following languages officially support emotion-specific prompts:

Assamese
Bengali
Bodo
Dogri
Kannada
Malayalam
Marathi
Sanskrit
Nepali
Tamil

Available emotions include: Command, Anger, Narration, Conversation, Disgust, Fear, Happy, Neutral, Proper Noun, News, Sad, and Surprise.

Speaker Examples

Here are examples of how to use descriptions for specific speakers:

Aditi - Slightly High-Pitched, Expressive Tone:

"Aditi speaks with a slightly higher pitch in a close-sounding environment. Her voice is clear, with subtle emotional depth and a normal pace, all captured in high-quality recording."

Sita - Rapid, Slightly Monotone:

"Sita speaks at a fast pace with a slightly low-pitched voice, captured clearly in a close-sounding environment with excellent recording quality."

Tapan - Male, Moderate Pace, Slightly Monotone:

"Tapan speaks at a moderate pace with a slightly monotone tone. The recording is clear, with a close sound and only minimal ambient noise."

Sunita - High-Pitched, Happy Tone:

"Sunita speaks with a high pitch in a close environment. Her voice is clear, with slight dynamic changes, and the recording is of excellent quality."

Karan - High-Pitched, Positive Tone:

"Karan's high-pitched, engaging voice is captured in a clear, close-sounding recording. His slightly slower delivery conveys a positive tone."

Amrita - High-Pitched, Flat Tone:

"Amrita speaks with a high pitch at a slow pace. Her voice is clear, with excellent recording quality and only moderate background noise."

Young Male Speaker, American Accent:

"A young male speaker with a high-pitched American accent delivers speech at a slightly fast pace in a clear, close-sounding recording."

Bikram - High-Pitched, Urgent Tone:

"Bikram speaks with a higher pitch and fast pace, conveying urgency. The recording is clear and intimate, with great emotional depth."

Anjali - High-Pitched, Neutral Tone:

"Anjali speaks with a high pitch at a normal pace in a clear, close-sounding environment. Her neutral tone is captured with excellent audio quality."

Model Information

Model: Indic Parler-TTS (fine-tuned from Indic Parler-TTS Pretrained)
Architecture: Based on Parler-TTS with enhancements for multilingual support
Training Data: 1,806 hours of multilingual Indic and English dataset
Languages: 21 officially supported languages
Voices: 69 unique voices across languages
Output Sampling Rate: 44.1 kHz (native to the model)

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
helpers		helpers
parler_tts		parler_tts
training		training
.gitignore		.gitignore
API_DOCUMENTATION.md		API_DOCUMENTATION.md
INFERENCE.md		INFERENCE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
app.py		app.py
example_hindi.py		example_hindi.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_server.py		run_server.py
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Indic Parler-TTS API

Features

Installation

Running the API

API Usage

Generate Speech

Get Supported Languages

Health Check

Testing the API

Language-Specific Voices

Assamese

Bengali

Bodo

Dogri

English

Gujarati

Hindi

Kannada

Malayalam

Manipuri

Marathi

Nepali

Odia

Sanskrit

Tamil

Telugu

Using Specific Speakers

How to Use Specific Speakers

Tips for Using Specific Speakers

Customizing Speech Output

Background Noise

Reverberation

Expressivity

Pitch

Speaking Rate

Voice Quality

Accent Control

Emotion Support

Speaker Examples

Aditi - Slightly High-Pitched, Expressive Tone:

Sita - Rapid, Slightly Monotone:

Tapan - Male, Moderate Pace, Slightly Monotone:

Sunita - High-Pitched, Happy Tone:

Karan - High-Pitched, Positive Tone:

Amrita - High-Pitched, Flat Tone:

Young Male Speaker, American Accent:

Bikram - High-Pitched, Urgent Tone:

Anjali - High-Pitched, Neutral Tone:

Model Information

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages