Local-first, open-source CLI that turns simple Markdown scripts into multi-speaker audio using Coqui XTTS v2.
Podvoice is built for developers who want a boring, reliable, offline text-to-speech workflow — no cloud APIs, no subscriptions, no vendor lock-in.
Runs on Linux, Windows, macOS, and FreeBSD.
- Most modern TTS tools depend on proprietary cloud services
- Developers want reproducible, script-based workflows
- Podcasts and narration should not require paid APIs
Podvoice is intentionally:
- Small
- Honest
- Hackable
- Local-first
No training pipelines. No research code. Just a clean CLI built on stable open-source components.
- Markdown-based scripts
- Multiple logical speakers
- Deterministic voice assignment
- Single stitched output file
- WAV or MP3 export
- Local-only inference
- CPU-first (GPU optional)
- Cross-platform support
| Platform | Status | Notes |
|---|---|---|
| Linux | ✅ Fully supported | Primary dev platform |
| macOS | ✅ Fully supported | Intel + Apple Silicon |
| Windows | ✅ Fully supported | PowerShell |
| FreeBSD | ✅ Supported | Requires ffmpeg |
| WSL2 | ✅ Supported | Recommended on Windows |
Podvoice consumes Markdown files with speaker blocks:
[Host | calm]
Welcome to the show.
[Guest | warm]
If this sounds useful, try writing your own script
and see how easily Markdown becomes audio.Rules:
- Speaker name is required
- Emotion tag is optional
- Text continues until the next speaker block
- Blank lines are allowed
Demo-Video.mp4
Demo-Audio.mp4
Required everywhere:
- Python 3.10.x
- ffmpeg
- Internet access only for first run
- ~5–8 GB free disk space (model cache)
sudo apt update
sudo apt install -y python3.10 python3.10-venv ffmpeg gitbrew install python@3.10 ffmpeg gitwinget install Python.Python.3.10
winget install ffmpeg
winget install Git.GitRestart the terminal after installing Python.
pkg install python310 ffmpeg gitgit clone https://github.com/aman179102/podvoice.git
cd podvoicechmod +x bootstrap.sh
./bootstrap.shThis script will:
- Verify Python 3.10
- Create a local
.venv - Install fully pinned dependencies from
requirements.lock - Install
podvoicein editable mode
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned.\bootstrap.ps1source .venv/bin/activate.venv\Scripts\Activate.ps1podvoice examples/demo.md --out demo.wavOr export MP3:
podvoice examples/demo.md --out demo.mp3On first run, Coqui XTTS v2 model weights will be downloaded and cached locally. Subsequent runs reuse the cache.
podvoice SCRIPT.md --out OUTPUTExamples:
podvoice examples/demo.md --out output.wavpodvoice examples/demo.md --out podcast.mp3 --language en --device cpu| Option | Description |
|---|---|
SCRIPT |
Input Markdown file |
--out, -o |
Output .wav or .mp3 |
--language, -l |
XTTS language code |
--device, -d |
cpu (default) or cuda |
If you have a compatible NVIDIA GPU:
podvoice examples/demo.md --device cudaIf CUDA is unavailable, Podvoice safely falls back to CPU.
You may see warnings like:
Could not initialize NNPACK! Reason: Unsupported hardware.
✔️ These are harmless ✔️ Audio generation will still complete ❌ No action required
Podvoice does not train voices.
Instead:
- Uses built-in XTTS v2 speakers
- Hashes speaker names deterministically
- Maps each logical speaker to a stable voice
Implications:
- Same speaker name → same voice
- Rename speaker → possibly different voice
- XTTS update → mapping may change
Fallback: default XTTS voice.
podvoice/
├── podvoice/
│ ├── cli.py # CLI entrypoint
│ ├── parser.py # Markdown parser
│ ├── tts.py # XTTS inference
│ ├── audio.py # Audio stitching
│ └── utils.py
│
├── examples/
│ └── demo.md
│
├── bootstrap.sh
├── bootstrap.ps1
├── requirements.lock
├── pyproject.toml
└── README.md
Podvoice generates natural-sounding speech.
Do not:
- Impersonate real people without consent
- Use generated audio for fraud or deception
Always disclose synthesized content where appropriate.
You are responsible for compliance with all applicable laws and licenses, including those of Coqui XTTS v2.
Podvoice is intentionally simple.
Good contributions:
- Bug reports with minimal reproduction scripts
- CLI UX improvements
- Documentation clarity
- Cross-platform fixes
Non-goals:
- Cloud dependencies
- Training pipelines
- Over-engineering
Goal: local, boring, reliable software.