🧠 Podvoice

Local-first, open-source CLI that turns simple Markdown scripts into multi-speaker audio using Coqui XTTS v2.

Podvoice is built for developers who want a boring, reliable, offline text-to-speech workflow — no cloud APIs, no subscriptions, no vendor lock-in.

Runs on Linux, Windows, macOS, and FreeBSD.

Why Podvoice exists

Most modern TTS tools depend on proprietary cloud services
Developers want reproducible, script-based workflows
Podcasts and narration should not require paid APIs

Podvoice is intentionally:

Small
Honest
Hackable
Local-first

No training pipelines. No research code. Just a clean CLI built on stable open-source components.

Features

Markdown-based scripts
Multiple logical speakers
Deterministic voice assignment
Single stitched output file
WAV or MP3 export
Local-only inference
CPU-first (GPU optional)
Cross-platform support

Supported platforms

Platform	Status	Notes
Linux	✅ Fully supported	Primary dev platform
macOS	✅ Fully supported	Intel + Apple Silicon
Windows	✅ Fully supported	PowerShell
FreeBSD	✅ Supported	Requires ffmpeg
WSL2	✅ Supported	Recommended on Windows

Input format

Podvoice consumes Markdown files with speaker blocks:

[Host | calm]
Welcome to the show.

[Guest | warm]
If this sounds useful, try writing your own script
and see how easily Markdown becomes audio.

Rules:

Speaker name is required
Emotion tag is optional
Text continues until the next speaker block
Blank lines are allowed

▶️ Demo Video

Demo-Video.mp4

🎧 Demo Audio

Demo-Audio.mp4

Quick start (ALL operating systems)

1️⃣ System requirements (common)

Required everywhere:

Python 3.10.x
ffmpeg
Internet access only for first run
~5–8 GB free disk space (model cache)

2️⃣ Install system dependencies

🐧 Linux (Ubuntu / Debian)

sudo apt update
sudo apt install -y python3.10 python3.10-venv ffmpeg git

🍎 macOS (Homebrew)

brew install python@3.10 ffmpeg git

🪟 Windows (PowerShell)

winget install Python.Python.3.10
winget install ffmpeg
winget install Git.Git

Restart the terminal after installing Python.

🐡 FreeBSD

pkg install python310 ffmpeg git

3️⃣ Clone the repository

git clone https://github.com/aman179102/podvoice.git
cd podvoice

Setup (recommended path)

🐧 Linux / 🍎 macOS / 🐡 FreeBSD

chmod +x bootstrap.sh
./bootstrap.sh

This script will:

Verify Python 3.10
Create a local .venv
Install fully pinned dependencies from requirements.lock
Install podvoice in editable mode

🪟 Windows (PowerShell)

One-time: allow local scripts

Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned

Run bootstrap

.\bootstrap.ps1

Activate the environment

Linux / macOS / FreeBSD

source .venv/bin/activate

Windows

.venv\Scripts\Activate.ps1

Run the demo

podvoice examples/demo.md --out demo.wav

Or export MP3:

podvoice examples/demo.md --out demo.mp3

On first run, Coqui XTTS v2 model weights will be downloaded and cached locally. Subsequent runs reuse the cache.

CLI usage

podvoice SCRIPT.md --out OUTPUT

Examples:

podvoice examples/demo.md --out output.wav

podvoice examples/demo.md --out podcast.mp3 --language en --device cpu

Options

Option	Description
`SCRIPT`	Input Markdown file
`--out`, `-o`	Output `.wav` or `.mp3`
`--language`, `-l`	XTTS language code
`--device`, `-d`	`cpu` (default) or `cuda`

GPU usage (optional)

If you have a compatible NVIDIA GPU:

podvoice examples/demo.md --device cuda

If CUDA is unavailable, Podvoice safely falls back to CPU.

Performance notes

You may see warnings like:

Could not initialize NNPACK! Reason: Unsupported hardware.

✔️ These are harmless ✔️ Audio generation will still complete ❌ No action required

How voices are assigned

Podvoice does not train voices.

Instead:

Uses built-in XTTS v2 speakers
Hashes speaker names deterministically
Maps each logical speaker to a stable voice

Implications:

Same speaker name → same voice
Rename speaker → possibly different voice
XTTS update → mapping may change

Fallback: default XTTS voice.

Project structure

podvoice/
├── podvoice/
│   ├── cli.py        # CLI entrypoint
│   ├── parser.py     # Markdown parser
│   ├── tts.py        # XTTS inference
│   ├── audio.py      # Audio stitching
│   └── utils.py
│
├── examples/
│   └── demo.md
│
├── bootstrap.sh
├── bootstrap.ps1
├── requirements.lock
├── pyproject.toml
└── README.md

Responsible use

Podvoice generates natural-sounding speech.

Do not:

Impersonate real people without consent
Use generated audio for fraud or deception

Always disclose synthesized content where appropriate.

You are responsible for compliance with all applicable laws and licenses, including those of Coqui XTTS v2.

Contributing

Podvoice is intentionally simple.

Good contributions:

Bug reports with minimal reproduction scripts
CLI UX improvements
Documentation clarity
Cross-platform fixes

Non-goals:

Cloud dependencies
Training pipelines
Over-engineering

Goal: local, boring, reliable software.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
examples		examples
podvoice		podvoice
.gitignore		.gitignore
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
bootstrap.ps1		bootstrap.ps1
bootstrap.sh		bootstrap.sh
constraints.in		constraints.in
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock

Folders and files

Latest commit

History

Repository files navigation

🧠 Podvoice

Why Podvoice exists

Features

Supported platforms

Input format

▶️ Demo Video

🎧 Demo Audio

Quick start (ALL operating systems)

1️⃣ System requirements (common)

2️⃣ Install system dependencies

🐧 Linux (Ubuntu / Debian)

🍎 macOS (Homebrew)

🪟 Windows (PowerShell)

🐡 FreeBSD

3️⃣ Clone the repository

Setup (recommended path)

🐧 Linux / 🍎 macOS / 🐡 FreeBSD

🪟 Windows (PowerShell)

One-time: allow local scripts

Run bootstrap

Activate the environment

Linux / macOS / FreeBSD

Windows

Run the demo

CLI usage

Options

GPU usage (optional)

Performance notes

How voices are assigned

Project structure

Responsible use

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Uh oh!

Contributors

Uh oh!

Languages