Skip to content

ai-bot-pro/podcastai

Repository files navigation

podcastai

Turn any URL / YouTube video / PDF into a visual podcast video — AI dialogue, stock imagery, background music, and a HyperFrames-rendered MP4. All for roughly $0.01 of LLM tokens (gen-podcast's Gemini usage).

How it works

source URL/PDF
     │
     ▼
gen-podcast (Gemini → multi-role dialogue → Edge TTS → mp3 + vtt)
     │
     ▼
Pexels / Pixabay (one image per dialogue segment, from English keywords)
     │
     ▼
Pixabay Music (royalty-free BGM; ducked under narration)
     │
     ▼
HyperFrames (HTML/CSS/GSAP composition → deterministic headless-Chrome render)
     │
     ▼
final.mp4

Architecture

podcastai is instruction-driven: the AI agent reads a pipeline manifest + stage director skills and drives the production state machine stage by stage. Python exists for tools and persistence only — no orchestration logic lives in code. This matches the OpenMontage "agent-first" contract.

Read AGENT_GUIDE.md to see exactly what the agent does.

Quick start

1. Install

git clone <this-repo>
cd podcastai
make setup          # creates venv, installs Python deps, warms HyperFrames cache

Also install the Node/ffmpeg prerequisites:

  • Node.js ≥ 22 (https://nodejs.org)
  • ffmpeg on PATH (brew install ffmpeg / apt install ffmpeg)

2. Configure

Copy .env.example.env and set:

GOOGLE_API_KEY=...       # Gemini — gen-podcast's LLM
PEXELS_API_KEY=...       # (or) PIXABAY_API_KEY — at least one

Note: pixabay_music needs no API key. GOOGLE_API_KEY is required because gen-podcast uses Gemini for dialogue generation.

3. Verify

make preflight           # list configured tools
make hyperframes-doctor  # verify Node/ffmpeg/npx + hyperframes npm package

4. Run end-to-end demo

make demo
# or with custom input:
.venv/bin/python render_demo.py \
    --url "https://en.wikipedia.org/wiki/Podcast" \
    --language zh \
    --playbook flat-motion-graphics \
    --project-name podcast-intro-zh

Output lands at projects/<project-name>/renders/final.mp4.

Using as an AI-agent-driven project

Open the project in Claude Code / Cursor / Codex. Say something like:

"Turn https://arxiv.org/abs/2401.02669 into a Chinese podcast video with professional visuals, 8 minutes or less."

The agent will:

  1. Read AGENT_GUIDE.md.
  2. Run preflight, report available tools.
  3. Propose a plan (voices, playbook, duration target) and wait for approval.
  4. Execute the 7 stages, checkpointing at creative stages.
  5. Deliver projects/<name>/renders/final.mp4.

Project layout

podcastai/
├── AGENT_GUIDE.md                    # Read-this-first agent contract
├── PROJECT_CONTEXT.md                # Architecture deep-dive
├── pipeline_defs/
│   └── podcast-visualizer.yaml       # The one pipeline
├── skills/
│   ├── core/                         # Layer 2 — hyperframes, podcast-audio, stock-media
│   ├── meta/                         # reviewer, checkpoint-protocol, onboarding
│   └── pipelines/podcast-visualizer/ # 8 stage director skills
├── tools/
│   ├── podcast/podcast_gen.py        # Wraps gen-podcast CLI
│   ├── graphics/                     # pexels / pixabay / image_selector
│   ├── audio/                        # pixabay_music / audio_mixer
│   ├── video/hyperframes_compose.py  # HyperFrames scaffold + lint + validate + render
│   └── subtitle/subtitle_gen.py      # VTT aggregation
├── styles/                           # clean-professional / flat-motion-graphics playbooks
├── lib/                              # checkpoint, pipeline_loader, style bridge
├── schemas/                          # JSON schemas for all artifacts
├── projects/                         # (gitignored) run workspaces
├── render_demo.py                    # End-to-end URL → MP4 driver
└── Makefile                          # setup, preflight, demo, hyperframes-doctor

Dependencies

Role Tool Required?
Dialogue generation Gemini (via gen-podcast) yes — GOOGLE_API_KEY
TTS Edge TTS (via gen-podcast, no key) implicit
Images Pexels or Pixabay one of PEXELS_API_KEY / PIXABAY_API_KEY
Music Pixabay Music scraper no key
Render HyperFrames npm + Node.js ≥ 22 + ffmpeg yes (local)

Relationship to OpenMontage

podcastai is a focused subset of the OpenMontage architecture — same instruction-driven contract, same tool-contract base class, same checkpoint/reviewer meta skills, same HyperFrames integration. Scoped down to one pipeline, one render runtime, stock media only.

If you want:

  • avatar/lip-sync presenters → use OpenMontage's avatar-spokesperson
  • Remotion React scenes → use OpenMontage's animated-explainer
  • word-level burned captions → use OpenMontage's remotion_caption_burn

podcastai handles the "podcast → visual companion video" case opinionated and cheap.

License

TBD (likely Apache-2.0 to match gen-podcast + OpenMontage).

About

use agent (claude code / codex / cursor) control gen_podcast + hyperframes pipeline workflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors