Podcast2PDF: Transcribing Podcasts into PDFs with Speaker Diarization

Overview

This project provides an end-to-end workflow for transforming raw audio and video into highly readable, speaker-aware PDF and Google Doc transcripts.

The pipeline automatically handles:

Media Acquisition: Downloads content from podcast RSS feeds or video URLs (using yt-dlp).
Speech Processing: Transcribes and diarizes the audio (identifying who spoke when) using the powerful Whisper model.
Output Formatting: Produces a cleanly formatted, speaker-aware PDF and simultaneously writes the document to a Google Doc for easy review, collaboration, or archiving.

Prerequisites

1. Install `yt-dlp` (Video/Audio Extractor)

To reliably extract high-quality audio from video sources like YouTube, this project uses yt-dlp, the current, actively maintained successor to youtube-dl.

# Download yt-dlp
sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp\
-o /usr/local/bin/yt-dlp \
&& sudo chmod 755 /usr/local/bin/yt-dlp

# Check installation success
yt-dlp --version

2. Set Up Google Docs API Credentials

This project leverages the guanqun-yang/seedwriter library to push the diarized transcript directly to a Google Doc. This requires proper API authentication:

You must have your Google API credential and token files stored in the following location:

~/.local/share/podcast2pdf/credentials.json
~/.local/share/podcast2pdf/token_docs.json

3.\ Set up the OpenAI API Key

You need to have an OpenAI API key to use the Whisper model for transcription. Set it as an environment variable in ~/.bashrc or ~/.zshrc:

OPENAI_API_KEY=<OPENAI_API_KEY>

Workflow

A. One-Step Quick Start

For the fastest results, run the entire pipeline with a single command, passing the URL directly:

podcast2pdf <VIDEO_OR_RSS_URL>

B. Step-by-Step Manual Workflow

If you need more control or debugging visibility, follow these steps:

Step 1: Download Media Audio

For Podcasts (via RSS):
1. Find the podcast show link (e.g., in Apple Podcasts).
2. Use a tool like getrssfeed.com or a library like Python's feedparser to extract the underlying RSS Feed URL.
3. Download the audio files:
```
npx podcast-dl --limit 5 --url <RSS_FEED_URL>
```
For YouTube/Video URLs (via yt-dlp): Download the highest quality audio and convert it to MP3:
```
yt-dlp -x --audio-format mp3 --audio-quality 0 <VIDEO_URL>
```
This will save a high-quality .mp3 file.

Step 2: Transcribe and Diarize Audio

The final stage uses OpenAI's models for state-of-the-art transcription and speaker diarization.

Environment Setup: Create and activate a dedicated environment for the project:

conda create --name Podcast2PDF python==3.12
conda activate Podcast2PDF
uv pip install -e .

Run Transcription:

# Example for an MP3 file
podcast2pdf mp3_audio.mp3 transcript.pdf --verbose
# Example for an M4A file
podcast2pdf m4a_audio.m4a transcript.pdf --verbose

Note on Pricing: Transcription costs are based on the token count of the resulting text, not the audio file size or duration. Consult the OpenAI documentation for the latest pricing.

Alternative: Consider fully managed services like AssemblyAI, which offer a comparable cost (approx. $0.27 per hour) for building a potentially no-code pipeline.

Technical Background

Why RSS Matters for Podcasts

Unlike music streaming, where platforms like Spotify and Apple Music host and serve all media from their own servers, podcast distribution is built on a decentralized, open ecosystem powered by RSS. Although RSS is no longer widely used for blogs and news delivery, it remains the backbone of podcasting—allowing creators to publish once and automatically reach listeners across many different podcast apps.

Creators upload episodes to independent hosting services (e.g., Libsyn, Megaphone, Podbean), which generate and manage each show’s RSS feed.
Podcast apps subscribe to the RSS feed, detect new episodes, and display them to listeners.

Landscape of Audio Transcription Models

Among major AI labs, OpenAI maintains a notable advantage in speech-to-text technology. While most competitors focus primarily on text-based LLMs, OpenAI has invested deeply in audio understanding:

Whisper has become the de-facto industry standard for open-source transcription: robust across accents, noisy environments, and rapid speech.
Whisper v3 API pushes accuracy even further, supporting 100+ languages with strong real-world performance.

For other companies:

AssemblyAI and Deepgram build products on top of existing APIs.
Google, Amazon, Meta, and Microsoft have models with less adoption.
Anthropic, Cohere, and Mistral do not have transcription models at all.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
datasets		datasets
podcast2pdf		podcast2pdf
resources		resources
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Podcast2PDF: Transcribing Podcasts into PDFs with Speaker Diarization

Overview

Prerequisites

1. Install `yt-dlp` (Video/Audio Extractor)

2. Set Up Google Docs API Credentials

3.\ Set up the OpenAI API Key

Workflow

A. One-Step Quick Start

B. Step-by-Step Manual Workflow

Step 1: Download Media Audio

Step 2: Transcribe and Diarize Audio

Technical Background

Why RSS Matters for Podcasts

Landscape of Audio Transcription Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

guanqun-yang/Podcast2PDF

Folders and files

Latest commit

History

Repository files navigation

Podcast2PDF: Transcribing Podcasts into PDFs with Speaker Diarization

Overview

Prerequisites

1. Install yt-dlp (Video/Audio Extractor)

2. Set Up Google Docs API Credentials

3.\ Set up the OpenAI API Key

Workflow

A. One-Step Quick Start

B. Step-by-Step Manual Workflow

Step 1: Download Media Audio

Step 2: Transcribe and Diarize Audio

Technical Background

Why RSS Matters for Podcasts

Landscape of Audio Transcription Models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Install `yt-dlp` (Video/Audio Extractor)

Packages