Skip to content

Latest commit

 

History

History
245 lines (190 loc) · 7.12 KB

File metadata and controls

245 lines (190 loc) · 7.12 KB

Indic Parler-TTS API Documentation

Overview

Indic Parler-TTS is a multilingual Indic extension of Parler-TTS Mini, supporting 21 languages including various Indian regional languages and English. This API provides a simple interface to generate natural sounding speech in any of these languages.

Installation

To run the API, you'll need to install the following dependencies:

pip install fastapi uvicorn transformers parler-tts soundfile numpy pydantic torch

Running the API

python run_server.py

This will start the API server on port 8000.

API Endpoints

1. Generate Speech

  • Endpoint: POST /tts

  • Description: Generate speech from text using the Indic Parler-TTS model

  • Request Body:

    • prompt (string, required): The text to convert to speech
    • description (string, optional): A detailed description of how the speech should sound (default: "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch...")
    • language (string, optional): Language code (default: "auto", will auto-detect based on prompt)
  • Response:

    • audio_base64 (string): Base64 encoded audio data in WAV format
    • sampling_rate (int): The sampling rate of the generated audio (44100 Hz - native to the model)

2. Get Supported Languages

  • Endpoint: GET /languages
  • Description: Get information about all supported languages and their recommended voices
  • Response: JSON object containing languages with their available and recommended speakers

3. Health Check

  • Endpoint: GET /
  • Description: Check if the API is running correctly
  • Response: Simple status message

Supported Languages and Voices

Indic Parler-TTS officially supports 21 languages with a total of 69 unique voices across these languages:

Officially Supported Languages

  1. Assamese
  2. Bengali
  3. Bodo
  4. Dogri
  5. English
  6. Gujarati
  7. Hindi
  8. Kannada
  9. Konkani
  10. Maithili
  11. Malayalam
  12. Manipuri
  13. Marathi
  14. Nepali
  15. Odia
  16. Sanskrit
  17. Santali
  18. Sindhi
  19. Tamil
  20. Telugu
  21. Urdu

Language-Specific Voices

Assamese

  • Available Speakers: Amit, Sita, Poonam, Rakesh
  • Recommended: Amit, Sita

Bengali

  • Available Speakers: Arjun, Aditi, Tapan, Rashmi, Arnav, Riya
  • Recommended: Arjun, Aditi

Bodo

  • Available Speakers: Bikram, Maya, Kalpana
  • Recommended: Bikram, Maya

Dogri

  • Available Speakers: Karan
  • Recommended: Karan

English

  • Available Speakers: Thoma, Mary, Swapna, Dinesh, Meera, Jatin, Aakash, Sneha, Kabir, Tisha, Chingkhei, Thoiba, Priya, Tarun, Gauri, Nisha, Raghav, Kavya, Ravi, Vikas, Riya
  • Recommended: Thoma, Mary

Gujarati

  • Available Speakers: Yash, Neha
  • Recommended: Yash, Neha

Hindi

  • Available Speakers: Rohit, Divya, Aman, Rani
  • Recommended: Rohit, Divya

Kannada

  • Available Speakers: Suresh, Anu, Chetan, Vidya
  • Recommended: Suresh, Anu

Malayalam

  • Available Speakers: Anjali, Anju, Harish
  • Recommended: Anjali, Harish

Manipuri

  • Available Speakers: Laishram, Ranjit
  • Recommended: Laishram, Ranjit

Marathi

  • Available Speakers: Sanjay, Sunita, Nikhil, Radha, Varun, Isha
  • Recommended: Sanjay, Sunita

Nepali

  • Available Speakers: Amrita
  • Recommended: Amrita

Odia

  • Available Speakers: Manas, Debjani
  • Recommended: Manas, Debjani

Sanskrit

  • Available Speakers: Aryan
  • Recommended: Aryan

Tamil

  • Available Speakers: Kavitha, Jaya
  • Recommended: Jaya

Telugu

  • Available Speakers: Prakash, Lalitha, Kiran
  • Recommended: Prakash, Lalitha

Emotion Support

The following languages officially support emotion-specific prompts:

  • Assamese
  • Bengali
  • Bodo
  • Dogri
  • Kannada
  • Malayalam
  • Marathi
  • Sanskrit
  • Nepali
  • Tamil

Available emotions include: Command, Anger, Narration, Conversation, Disgust, Fear, Happy, Neutral, Proper Noun, News, Sad, and Surprise.

Usage Examples

1. Generate English Speech (cURL)

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello, how are you doing today?",
    "description": "A female speaker with a British accent delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
  }' -o english_output.wav

2. Generate Hindi Speech (cURL)

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "नमस्ते, आप कैसे हैं?",
    "description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker'\''s voice sounding clear and very close up."
  }' -o hindi_output.wav

3. Generate Speech with Specific Speaker (Hindi)

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "नमस्कार, आप कैसी हैं?",
    "description": "Divya'\''s voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
  }' -o hindi_specific_speaker.wav

4. Generate Tamil Speech (cURL)

curl -X POST "http://localhost:8000/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "ஹலோ, நீங்கள் இன்று எப்படி இருக்கிறீர்கள்?",
    "description": "A female speaker with a soft and gentle tone speaks in a moderate pace. The recording is very clear with no background noise."
  }' -o tamil_output.wav

4. Get Supported Languages (cURL)

curl -X GET "http://localhost:8000/languages"

Customizing Speech Output

Indic Parler-TTS offers precise control over various speech characteristics using the description field:

Background Noise

  • Use "very clear audio" for highest quality
  • Use "very noisy audio" for high background noise levels

Reverberation

  • Controls the perceived distance of the voice (close to distant sounding)

Expressivity

  • From monotone to highly expressive
  • Use terms like "slightly expressive", "animated", or "monotone"

Pitch

  • Specify as "high-pitched", "low-pitched", or "moderate pitch"

Speaking Rate

  • From "slow" to "fast-paced"

Voice Quality

  • From "basic" to "refined" voice quality

Accent Control

  • Specify accents like "British accent", "American accent", etc.
  • Example: "A male British speaker"

Tips

  1. The language is automatically detected based on the prompt text
  2. For better naturalness, use recommended voices for each language
  3. You can use punctuation to control prosody (e.g., commas for small breaks)
  4. For Indian English accents, use the English voices which support this natively

Model Information

  • Model: Indic Parler-TTS (fine-tuned from Indic Parler-TTS Pretrained)
  • Architecture: Based on Parler-TTS with enhancements for multilingual support
  • Training Data: 1,806 hours of multilingual Indic and English dataset
  • Languages: 21 officially supported languages
  • Voices: 69 unique voices across languages
  • Output Sampling Rate: 44.1 kHz (native to the model)