Skip to content

CelestialCreator/indic-parler-tts

Repository files navigation

Indic Parler-TTS API

This project provides a FastAPI interface for Indic Parler-TTS, a multilingual text-to-speech model that supports 21 Indian languages and English. The API allows you to generate natural-sounding speech from text in various Indian languages.

Features

  • Supports 21 languages: Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu
  • 69 unique voices across languages
  • Full control over voice characteristics (pitch, speed, tone, etc.)
  • Support for emotion-specific prompts in 10 languages
  • 44.1 kHz output sampling rate (native to the model)

Installation

  1. Install the required dependencies:
pip install -r requirements.txt

Running the API

  1. Start the API server:
python run_server.py

The API will be available at http://localhost:8000.

API Usage

Generate Speech

  • Endpoint: POST /tts
  • Description: Generate speech from text using the Indic Parler-TTS model

Request Body:

  • prompt (string, required): The text to convert to speech
  • description (string, optional): A detailed description of how the speech should sound (default: "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch...")

Example Request:

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello, how are you doing today?",
    "description": "A female speaker with a British accent delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
  }' -o english_output.wav

Hindi Example:

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "नमस्ते, आप कैसे हैं?",
    "description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
  }' -o hindi_output.wav

Specific Speaker Example:

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "नमस्कार, आप कैसी हैं?",
    "description": "Divya'\''s voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
  }' -o hindi_specific_speaker.wav

Tamil Example:

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "ஹலோ, நீங்கள் இன்று எப்படி இருக்கிறீர்கள்?",
    "description": "A female speaker with a soft and gentle tone speaks in a moderate pace. The recording is very clear with no background noise."
  }' -o tamil_output.wav

Response:

  • Direct WAV audio file (Content-Type: audio/wav)

Get Supported Languages

  • Endpoint: GET /languages
  • Description: Get information about all supported languages and their recommended voices

Health Check

  • Endpoint: GET /
  • Description: Check if the API is running correctly

Testing the API

  1. Start the API server (see above)
  2. Use the test script to verify functionality:
python test_api.py

This will create several audio files in the working directory:

  • english_output.wav - English speech sample
  • hindi_output.wav - Hindi speech sample
  • tamil_output.wav - Tamil speech sample
  • hindi_specific_speaker.wav - Hindi with specific speaker

Language-Specific Voices

Assamese

  • Available Speakers: Amit, Sita, Poonam, Rakesh
  • Recommended: Amit, Sita

Bengali

  • Available Speakers: Arjun, Aditi, Tapan, Rashmi, Arnav, Riya
  • Recommended: Arjun, Aditi

Bodo

  • Available Speakers: Bikram, Maya, Kalpana
  • Recommended: Bikram, Maya

Dogri

  • Available Speakers: Karan
  • Recommended: Karan

English

  • Available Speakers: Thoma, Mary, Swapna, Dinesh, Meera, Jatin, Aakash, Sneha, Kabir, Tisha, Chingkhei, Thoiba, Priya, Tarun, Gauri, Nisha, Raghav, Kavya, Ravi, Vikas, Riya
  • Recommended: Thoma, Mary

Gujarati

  • Available Speakers: Yash, Neha
  • Recommended: Yash, Neha

Hindi

  • Available Speakers: Rohit, Divya, Aman, Rani
  • Recommended: Rohit, Divya

Kannada

  • Available Speakers: Suresh, Anu, Chetan, Vidya
  • Recommended: Suresh, Anu

Malayalam

  • Available Speakers: Anjali, Anju, Harish
  • Recommended: Anjali, Harish

Manipuri

  • Available Speakers: Laishram, Ranjit
  • Recommended: Laishram, Ranjit

Marathi

  • Available Speakers: Sanjay, Sunita, Nikhil, Radha, Varun, Isha
  • Recommended: Sanjay, Sunita

Nepali

  • Available Speakers: Amrita
  • Recommended: Amrita

Odia

  • Available Speakers: Manas, Debjani
  • Recommended: Manas, Debjani

Sanskrit

  • Available Speakers: Aryan
  • Recommended: Aryan

Tamil

  • Available Speakers: Kavitha, Jaya
  • Recommended: Jaya

Telugu

  • Available Speakers: Prakash, Lalitha, Kiran
  • Recommended: Prakash, Lalitha

Using Specific Speakers

To ensure speaker consistency across generations, Indic Parler-TTS has been trained on predetermined speakers for each language. To use a specific speaker, adapt your description to reference the speaker by name.

How to Use Specific Speakers

Simply include the speaker's name in your description field:

  • Example: "Divya's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."

Tips for Using Specific Speakers

  • Use speaker names for consistent voice characteristics
  • You can combine speaker names with other voice features (pitch, speed, tone, etc.)
  • For best results, use the recommended speakers for each language

Customizing Speech Output

Indic Parler-TTS offers precise control over various speech characteristics using the description field:

Background Noise

  • Use "very clear audio" for highest quality
  • Use "very noisy audio" for high background noise levels

Reverberation

  • Controls the perceived distance of the voice (close to distant sounding)

Expressivity

  • From monotone to highly expressive
  • Use terms like "slightly expressive", "animated", or "monotone"

Pitch

  • Specify as "high-pitched", "low-pitched", or "moderate pitch"

Speaking Rate

  • From "slow" to "fast-paced"

Voice Quality

  • From "basic" to "refined" voice quality

Accent Control

  • Specify accents like "British accent", "American accent", etc.
  • Example: "A male British speaker"

Emotion Support

The following languages officially support emotion-specific prompts:

  • Assamese
  • Bengali
  • Bodo
  • Dogri
  • Kannada
  • Malayalam
  • Marathi
  • Sanskrit
  • Nepali
  • Tamil

Available emotions include: Command, Anger, Narration, Conversation, Disgust, Fear, Happy, Neutral, Proper Noun, News, Sad, and Surprise.

Speaker Examples

Here are examples of how to use descriptions for specific speakers:

Aditi - Slightly High-Pitched, Expressive Tone:

"Aditi speaks with a slightly higher pitch in a close-sounding environment. Her voice is clear, with subtle emotional depth and a normal pace, all captured in high-quality recording."

Sita - Rapid, Slightly Monotone:

"Sita speaks at a fast pace with a slightly low-pitched voice, captured clearly in a close-sounding environment with excellent recording quality."

Tapan - Male, Moderate Pace, Slightly Monotone:

"Tapan speaks at a moderate pace with a slightly monotone tone. The recording is clear, with a close sound and only minimal ambient noise."

Sunita - High-Pitched, Happy Tone:

"Sunita speaks with a high pitch in a close environment. Her voice is clear, with slight dynamic changes, and the recording is of excellent quality."

Karan - High-Pitched, Positive Tone:

"Karan's high-pitched, engaging voice is captured in a clear, close-sounding recording. His slightly slower delivery conveys a positive tone."

Amrita - High-Pitched, Flat Tone:

"Amrita speaks with a high pitch at a slow pace. Her voice is clear, with excellent recording quality and only moderate background noise."

Young Male Speaker, American Accent:

"A young male speaker with a high-pitched American accent delivers speech at a slightly fast pace in a clear, close-sounding recording."

Bikram - High-Pitched, Urgent Tone:

"Bikram speaks with a higher pitch and fast pace, conveying urgency. The recording is clear and intimate, with great emotional depth."

Anjali - High-Pitched, Neutral Tone:

"Anjali speaks with a high pitch at a normal pace in a clear, close-sounding environment. Her neutral tone is captured with excellent audio quality."

Model Information

  • Model: Indic Parler-TTS (fine-tuned from Indic Parler-TTS Pretrained)
  • Architecture: Based on Parler-TTS with enhancements for multilingual support
  • Training Data: 1,806 hours of multilingual Indic and English dataset
  • Languages: 21 officially supported languages
  • Voices: 69 unique voices across languages
  • Output Sampling Rate: 44.1 kHz (native to the model)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages