A Python tool to generate multilingual podcast-ready audio files from text using Google Cloud Text-to-Speech and Translation APIs. Supports batch audio generation, automatic translation, and natural-sounding speech via SSML.
- Batch Audio Generation: Create multiple audio files at once from different text inputs.
- Automatic Translation: Instantly translate your text to multiple languages and generate corresponding audio files.
- SSML Support: Use SSML tags to control pronunciation, pauses, pitch, and more for natural-sounding speech.
- Easy Setup: Simple Python script with minimal configuration.
- Python 3.7+
- Google Cloud account with enabled Text-to-Speech and Translation APIs
- Service account key JSON file
- See
requirements.txt
for Python dependencies
-
Install dependencies:
pip install -r requirements.txt
-
Google Cloud credentials:
-
Create a Google Cloud project and enable the Text-to-Speech API and Translation API.
-
Download a service account key JSON file.
-
Set the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"
-
-
Edit your input text or SSML as needed. Make sure to wrap content for translation between
<speak>
and</speak>
tags. -
Run the script:
python tts_google.py
-
The generated
filename.mp3
(or multiple files) will appear in the current directory, containing the spoken version of your input.
- SSML Size: Maximum SSML input size is 5000 bytes. Larger inputs will cause an error.
- Voice Availability: Available voices are subject to Google Cloud Text-to-Speech API limitations.
- Translation Scope: Only text within
<speak>
and</speak>
tags will be translated.
Contributions are welcome! Please open an issue or submit a pull request with suggestions, bug fixes, or new features.