Gen-TTS: Gemini Native Audio Generation CLI

gen-tts is a powerful command-line interface for Google's Gemini Native Text-to-Speech (TTS) capabilities. It allows you to generate high-quality, expressive speech from text, including single-speaker narration, multi-speaker conversations, and AI-generated podcasts and summaries.

Powered by the Gemini 2.5 and Gemini 2.0 models.

Features

High-Quality Voices: Access Gemini's full range of expressive voices (e.g., Charon, Kore, Fenrir, Puck).
Multi-Speaker Support: Generate conversations between different speakers with distinct voices.
Podcast Mode: Automatically turn any text or file into a lively "Deep Dive" podcast conversation between two hosts.
Summary Mode: Summarize long text into a concise, information-packed audio briefing read by a warm, professional voice.
Transcript Generation: Ask Gemini to write a script for you based on a topic, then immediately synthesize it.
MP3 Support: Automatically converts output to MP3 (requires ffmpeg) or WAV.
Cross-Platform Playback: Automatically plays the generated audio on Linux, macOS, and Windows.

Installation

Prerequisites

Python 3.9+
ffmpeg (Required for MP3 support and playback on Linux)
- Linux: sudo apt install ffmpeg
- macOS: brew install ffmpeg

Install via uv (Recommended)

uv tool install git+https://github.com/charles-forsyth/generate-tts.git

Install via pip

pip install git+https://github.com/charles-forsyth/generate-tts.git

Configuration

The tool requires a Google Cloud API Key. On first run, it will create a config file at ~/.config/gen-tts/.env where you can paste your key.

# ~/.config/gen-tts/.env
GOOGLE_API_KEY="your_actual_api_key_here"

Usage

1. Basic Single Speaker

Generate audio from text. Default voice is Charon (Deep, Warm Male).

gen-tts "System systems operational." --temp

--temp plays the audio immediately without saving a file.

2. Podcast Mode ("Deep Dive")

Turn an article, report, or text into an engaging podcast conversation between two hosts (Fenrir and Leda).

# From a file
gen-tts --input-file article.txt --podcast --output-file deep_dive.mp3

# Piping text
echo "Breaking news..." | gen-tts --podcast

3. Summary Mode

Summarize text into a concise, professional audio briefing. Default voice is Charon.

cat report.txt | gen-tts --summary --output-file briefing.mp3

4. Topic-Based Generation

Ask Gemini to write a script for you and then speak it.

gen-tts --generate-transcript "A funny debate about coffee vs tea" \
        --multi-speaker --speaker-voices Alice=Kore Bob=Puck \
        --output-file debate.mp3

5. Custom Multi-Speaker

Provide your own script formatted as Speaker: Text.

script.txt:

Joe: Hey Jane, did you see the update?
Jane: Yes, it looks amazing!

Command:

gen-tts --input-file script.txt --multi-speaker \
        --speaker-voices Joe=Charon Jane=Puck \
        --audio-format MP3

Options Reference

Flag	Description
`--podcast`	Generate a multi-speaker podcast script from input.
`--summary`	Generate a concise summary script from input.
`--generate-transcript "TOPIC"`	Generate a script based on a topic.
`--multi-speaker`	Enable multi-speaker mode (requires `--speaker-voices`).
`--speaker-voices`	Map speakers to voices (e.g., `Host=Fenrir Guest=Leda`).
`--voice-name`	Voice for single-speaker mode (Default: `Charon`).
`--audio-format`	`WAV` or `MP3`. Defaults to `MP3` for podcasts/summaries.
`--model`	TTS Model (Default: `gemini-2.5-flash-preview-tts`).
`--transcript-model`	Model for script generation (Default: `gemini-2.5-pro`).
`--list-voices`	List all available Gemini voices.
`--no-play`	Disable automatic playback.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
src		src
tests		tests
.coverage		.coverage
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gen-TTS: Gemini Native Audio Generation CLI

Features

Installation

Prerequisites

Install via uv (Recommended)

Install via pip

Configuration

Usage

1. Basic Single Speaker

2. Podcast Mode ("Deep Dive")

3. Summary Mode

4. Topic-Based Generation

5. Custom Multi-Speaker

Options Reference

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gen-TTS: Gemini Native Audio Generation CLI

Features

Installation

Prerequisites

Install via uv (Recommended)

Install via pip

Configuration

Usage

1. Basic Single Speaker

2. Podcast Mode ("Deep Dive")

3. Summary Mode

4. Topic-Based Generation

5. Custom Multi-Speaker

Options Reference

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages