Turn any URL / YouTube video / PDF into a visual podcast video — AI dialogue, stock imagery, background music, and a HyperFrames-rendered MP4. All for roughly $0.01 of LLM tokens (gen-podcast's Gemini usage).
source URL/PDF
│
▼
gen-podcast (Gemini → multi-role dialogue → Edge TTS → mp3 + vtt)
│
▼
Pexels / Pixabay (one image per dialogue segment, from English keywords)
│
▼
Pixabay Music (royalty-free BGM; ducked under narration)
│
▼
HyperFrames (HTML/CSS/GSAP composition → deterministic headless-Chrome render)
│
▼
final.mp4
podcastai is instruction-driven: the AI agent reads a pipeline manifest + stage director skills and drives the production state machine stage by stage. Python exists for tools and persistence only — no orchestration logic lives in code. This matches the OpenMontage "agent-first" contract.
Read AGENT_GUIDE.md to see exactly what the agent does.
git clone <this-repo>
cd podcastai
make setup # creates venv, installs Python deps, warms HyperFrames cacheAlso install the Node/ffmpeg prerequisites:
- Node.js ≥ 22 (https://nodejs.org)
- ffmpeg on PATH (
brew install ffmpeg/apt install ffmpeg)
Copy .env.example → .env and set:
GOOGLE_API_KEY=... # Gemini — gen-podcast's LLM
PEXELS_API_KEY=... # (or) PIXABAY_API_KEY — at least oneNote: pixabay_music needs no API key. GOOGLE_API_KEY is required because
gen-podcast uses Gemini for dialogue generation.
make preflight # list configured tools
make hyperframes-doctor # verify Node/ffmpeg/npx + hyperframes npm packagemake demo
# or with custom input:
.venv/bin/python render_demo.py \
--url "https://en.wikipedia.org/wiki/Podcast" \
--language zh \
--playbook flat-motion-graphics \
--project-name podcast-intro-zhOutput lands at projects/<project-name>/renders/final.mp4.
Open the project in Claude Code / Cursor / Codex. Say something like:
"Turn https://arxiv.org/abs/2401.02669 into a Chinese podcast video with professional visuals, 8 minutes or less."
The agent will:
- Read
AGENT_GUIDE.md. - Run preflight, report available tools.
- Propose a plan (voices, playbook, duration target) and wait for approval.
- Execute the 7 stages, checkpointing at creative stages.
- Deliver
projects/<name>/renders/final.mp4.
podcastai/
├── AGENT_GUIDE.md # Read-this-first agent contract
├── PROJECT_CONTEXT.md # Architecture deep-dive
├── pipeline_defs/
│ └── podcast-visualizer.yaml # The one pipeline
├── skills/
│ ├── core/ # Layer 2 — hyperframes, podcast-audio, stock-media
│ ├── meta/ # reviewer, checkpoint-protocol, onboarding
│ └── pipelines/podcast-visualizer/ # 8 stage director skills
├── tools/
│ ├── podcast/podcast_gen.py # Wraps gen-podcast CLI
│ ├── graphics/ # pexels / pixabay / image_selector
│ ├── audio/ # pixabay_music / audio_mixer
│ ├── video/hyperframes_compose.py # HyperFrames scaffold + lint + validate + render
│ └── subtitle/subtitle_gen.py # VTT aggregation
├── styles/ # clean-professional / flat-motion-graphics playbooks
├── lib/ # checkpoint, pipeline_loader, style bridge
├── schemas/ # JSON schemas for all artifacts
├── projects/ # (gitignored) run workspaces
├── render_demo.py # End-to-end URL → MP4 driver
└── Makefile # setup, preflight, demo, hyperframes-doctor
| Role | Tool | Required? |
|---|---|---|
| Dialogue generation | Gemini (via gen-podcast) |
yes — GOOGLE_API_KEY |
| TTS | Edge TTS (via gen-podcast, no key) |
implicit |
| Images | Pexels or Pixabay | one of PEXELS_API_KEY / PIXABAY_API_KEY |
| Music | Pixabay Music scraper | no key |
| Render | HyperFrames npm + Node.js ≥ 22 + ffmpeg | yes (local) |
podcastai is a focused subset of the OpenMontage architecture — same instruction-driven contract, same tool-contract base class, same checkpoint/reviewer meta skills, same HyperFrames integration. Scoped down to one pipeline, one render runtime, stock media only.
If you want:
- avatar/lip-sync presenters → use OpenMontage's
avatar-spokesperson - Remotion React scenes → use OpenMontage's
animated-explainer - word-level burned captions → use OpenMontage's
remotion_caption_burn
podcastai handles the "podcast → visual companion video" case opinionated and cheap.
TBD (likely Apache-2.0 to match gen-podcast + OpenMontage).