A comprehensive podcast generation toolkit using Google's Gemini API for text-to-speech conversion with multi-speaker support and professional audio processing.
- Single Speaker TTS: Convert text to speech with 6 different voices
- Multi-Speaker Interviews: Create podcast conversations with multiple speakers
- Script Generation: AI-powered podcast script creation
- Professional Audio: Automatic format conversion with proper headers
- Streaming Audio: Real-time audio generation with chunk processing
- Zephyr: Natural, conversational tone
- Puck: Friendly, engaging voice
- Charon: Professional, authoritative
- Kore: Warm, approachable
- Uranus: Distinctive, memorable
- Fenrir: Strong, dramatic
- Multiple Audio Formats: WAV, MP3 with proper formatting
- REST API Integration: Complete API testing infrastructure
- Command-Line Interface: Professional CLI tools
- Comprehensive Testing: Multi-layer testing strategy
- Production Ready: Enterprise-grade implementation
# Load environment variables
export $(cat .env | xargs)
# Activate virtual environment
source venv/bin/activatepython3 scripts/podcast_cli.py voicespython3 scripts/podcast_cli.py single "Hello world!" -v ZephyrSCRIPT="Speaker 1: Welcome!\nSpeaker 2: Thanks for having me!"
python3 scripts/podcast_cli.py multi "$SCRIPT" -s "Speaker 1:Zephyr" "Speaker 2:Puck"python3 scripts/podcast_cli.py script "AI in Healthcare" -s interviewβββ .env # Environment variables (API keys)
βββ .gitignore # Git ignore rules
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ SETUP_GUIDE.md # Detailed setup instructions
βββ venv/ # Python virtual environment
βββ .tmp/ # Temporary files and testing
β βββ audio_outputs/ # Generated audio files
β βββ curl_audio_outputs/ # CURL-generated audio files
βββ scripts/ # Main application code
β βββ gemini_tts.py # Core TTS wrapper class
β βββ podcast_cli.py # Command-line interface
βββ tests/ # Test files and suites
- Python 3.7+
- Git
- GitHub CLI (for repository management)
- curl (for API testing)
- Clone the repository
- Create virtual environment:
python3 -m venv venv - Activate virtual environment:
source venv/bin/activate - Install dependencies:
pip install -r requirements.txt - Set up environment variables in
.env - Run tests to verify installation
- SETUP_GUIDE.md - Complete installation and usage guide
- API Documentation - Official Gemini API docs
- Audio Guide - Audio-specific documentation
# Run comprehensive test suite
bash .tmp/auth_testing_master.sh
# Run CURL tests
bash .tmp/test_curl_tts.sh
# Run REST API tests
bash .tmp/test_rest_api.sh# Test single speaker
python3 .tmp/test_gemini_tts.py
# Test multi-speaker
bash .tmp/raw_curl_2speaker_mp3.shThe system supports multiple authentication methods:
- API Key Authentication: Primary method via environment variables
- Bearer Token: Alternative authentication method
- Comprehensive Testing: Authentication validation suite
# Test with curl
curl -X POST \
-H "Content-Type: application/json" \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-preview-tts:streamGenerateContent?key=YOUR_API_KEY" \
-d @request.jsonfrom scripts.gemini_tts import GeminiTTS
tts = GeminiTTS()
audio_file = tts.generate_speech("Hello world!", voice_name="Zephyr")β Audio Generation: Real, listenable audio files β Multi-Speaker Support: Natural conversation flow β Professional Quality: High-quality audio output β Comprehensive Testing: Multi-layer validation β Production Ready: Enterprise-grade implementation
- API Rate Limits: Check usage at https://ai.google.dev/usage
- Authentication Errors: Verify API key in .env file
- Audio Format Issues: Check MIME type handling
- Network Connectivity: Ensure HTTPS access to Google APIs
# Enable debug logging
export DEBUG=true
python3 scripts/podcast_cli.py single "test" -v Zephyrspeaker_configs = [
{"speaker": "Host", "voice": "Zephyr"},
{"speaker": "Guest", "voice": "Puck"}
]
tts.generate_podcast_interview(script, speaker_configs)# Generate multiple files
for voice in Zephyr Puck Charon Kore Uranus Fenrir; do
python3 scripts/podcast_cli.py single "Testing voice $voice" -v $voice -o "voice_$voice"
done- Streaming Processing: Real-time audio generation
- Efficient Memory Usage: Chunk-based processing
- Multi-format Support: Automatic format conversion
- Error Recovery: Robust error handling
- Fork the repository
- Create a feature branch
- Make your changes
- Add comprehensive tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google AI: For the amazing Gemini API
- GitHub: For providing the platform
- Python Community: For excellent libraries
- Open Source: For making this possible
Generated with β€οΈ and π± supervision in mom's basement