Skip to content

almazom/gemini-tts-podcast-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Gemini TTS Podcast Generator

A comprehensive podcast generation toolkit using Google's Gemini API for text-to-speech conversion with multi-speaker support and professional audio processing.

🌟 Features

Core TTS Functionality

  • Single Speaker TTS: Convert text to speech with 6 different voices
  • Multi-Speaker Interviews: Create podcast conversations with multiple speakers
  • Script Generation: AI-powered podcast script creation
  • Professional Audio: Automatic format conversion with proper headers
  • Streaming Audio: Real-time audio generation with chunk processing

Voice Selection

  • Zephyr: Natural, conversational tone
  • Puck: Friendly, engaging voice
  • Charon: Professional, authoritative
  • Kore: Warm, approachable
  • Uranus: Distinctive, memorable
  • Fenrir: Strong, dramatic

Technical Features

  • Multiple Audio Formats: WAV, MP3 with proper formatting
  • REST API Integration: Complete API testing infrastructure
  • Command-Line Interface: Professional CLI tools
  • Comprehensive Testing: Multi-layer testing strategy
  • Production Ready: Enterprise-grade implementation

πŸš€ Quick Start

1. Environment Setup

# Load environment variables
export $(cat .env | xargs)

# Activate virtual environment
source venv/bin/activate

2. List Available Voices

python3 scripts/podcast_cli.py voices

3. Generate Single Speaker Audio

python3 scripts/podcast_cli.py single "Hello world!" -v Zephyr

4. Create Multi-Speaker Interview

SCRIPT="Speaker 1: Welcome!\nSpeaker 2: Thanks for having me!"
python3 scripts/podcast_cli.py multi "$SCRIPT" -s "Speaker 1:Zephyr" "Speaker 2:Puck"

5. Generate Script First

python3 scripts/podcast_cli.py script "AI in Healthcare" -s interview

πŸ—οΈ Project Structure

β”œβ”€β”€ .env                              # Environment variables (API keys)
β”œβ”€β”€ .gitignore                       # Git ignore rules
β”œβ”€β”€ requirements.txt                 # Python dependencies
β”œβ”€β”€ README.md                        # This file
β”œβ”€β”€ SETUP_GUIDE.md                   # Detailed setup instructions
β”œβ”€β”€ venv/                           # Python virtual environment
β”œβ”€β”€ .tmp/                           # Temporary files and testing
β”‚   β”œβ”€β”€ audio_outputs/              # Generated audio files
β”‚   └── curl_audio_outputs/         # CURL-generated audio files
β”œβ”€β”€ scripts/                        # Main application code
β”‚   β”œβ”€β”€ gemini_tts.py               # Core TTS wrapper class
β”‚   └── podcast_cli.py              # Command-line interface
└── tests/                          # Test files and suites

πŸ”§ Installation

Prerequisites

  • Python 3.7+
  • Git
  • GitHub CLI (for repository management)
  • curl (for API testing)

Setup

  1. Clone the repository
  2. Create virtual environment: python3 -m venv venv
  3. Activate virtual environment: source venv/bin/activate
  4. Install dependencies: pip install -r requirements.txt
  5. Set up environment variables in .env
  6. Run tests to verify installation

πŸ“– Documentation

πŸ§ͺ Testing

Run All Tests

# Run comprehensive test suite
bash .tmp/auth_testing_master.sh

# Run CURL tests
bash .tmp/test_curl_tts.sh

# Run REST API tests
bash .tmp/test_rest_api.sh

Test Specific Functionality

# Test single speaker
python3 .tmp/test_gemini_tts.py

# Test multi-speaker
bash .tmp/raw_curl_2speaker_mp3.sh

πŸ” Authentication

The system supports multiple authentication methods:

  • API Key Authentication: Primary method via environment variables
  • Bearer Token: Alternative authentication method
  • Comprehensive Testing: Authentication validation suite

πŸ“Š API Usage

Direct REST API

# Test with curl
curl -X POST \
  -H "Content-Type: application/json" \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-preview-tts:streamGenerateContent?key=YOUR_API_KEY" \
  -d @request.json

Python Integration

from scripts.gemini_tts import GeminiTTS

tts = GeminiTTS()
audio_file = tts.generate_speech("Hello world!", voice_name="Zephyr")

🎯 Success Criteria

βœ… Audio Generation: Real, listenable audio files βœ… Multi-Speaker Support: Natural conversation flow βœ… Professional Quality: High-quality audio output βœ… Comprehensive Testing: Multi-layer validation βœ… Production Ready: Enterprise-grade implementation

πŸ” Troubleshooting

Common Issues

  1. API Rate Limits: Check usage at https://ai.google.dev/usage
  2. Authentication Errors: Verify API key in .env file
  3. Audio Format Issues: Check MIME type handling
  4. Network Connectivity: Ensure HTTPS access to Google APIs

Debug Mode

# Enable debug logging
export DEBUG=true
python3 scripts/podcast_cli.py single "test" -v Zephyr

πŸš€ Advanced Usage

Custom Voice Configuration

speaker_configs = [
    {"speaker": "Host", "voice": "Zephyr"},
    {"speaker": "Guest", "voice": "Puck"}
]
tts.generate_podcast_interview(script, speaker_configs)

Batch Processing

# Generate multiple files
for voice in Zephyr Puck Charon Kore Uranus Fenrir; do
    python3 scripts/podcast_cli.py single "Testing voice $voice" -v $voice -o "voice_$voice"
done

πŸ“ˆ Performance

  • Streaming Processing: Real-time audio generation
  • Efficient Memory Usage: Chunk-based processing
  • Multi-format Support: Automatic format conversion
  • Error Recovery: Robust error handling

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add comprehensive tests
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Google AI: For the amazing Gemini API
  • GitHub: For providing the platform
  • Python Community: For excellent libraries
  • Open Source: For making this possible

Generated with ❀️ and 🐱 supervision in mom's basement

About

Complete podcast generation toolkit using Google's Gemini API TTS with multi-speaker support and comprehensive testing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors