System Architecture Overview

Content Delivery vs. Background Processing Architecture

┌─────────────────────────────────────────────────────────────────────────────────┐
│                            CONTENT SOURCES                                      │
├─────────────────────────┬───────────────────────┬─────────────────────────────——┤
│    YouTube Videos       │    Podcast Index API  │    Direct Podcast URLs        │
│   (youtube.com/...)     │   (podcastindex.org)  │   (RSS feeds/MP3 URLs)        │
└─────────┬───────────────┼───────────────────────┼─────────────────┬───────────——┘
          │               │                       │                 │
    ┌─────┴─────┐   ┌─────┴─────┐           ┌─────┴─────┐     ┌─────┴─────┐
    │ STREAMING │   │ STREAMING │           │ STREAMING │     │ STREAMING │
    │ TO USER   │   │ TO USER   │           │ TO USER   │     │ TO USER   │
    └─────┬─────┘   └─────┬─────┘           └─────┬─────┘     └─────┬─────┘
          │               │                       │                 │
          │         ┌─────┴─────┐           ┌─────┴─────┐           │
          │         │BACKGROUND │           │BACKGROUND │           │
          │         │PROCESSING │           │PROCESSING │           │
          │         │(COPYING)  │           │(COPYING)  │           │
          │         └─────┬─────┘           └─────┬─────┘           │
          │               │                       │                 │
          ▼               ▼                       ▼                 ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                     USER INTERFACE (React Frontend)                             │
│                                                                                 │
│  USER SEES:                          BACKGROUND PROCESSING:                     │
│  • YouTube embedded player          • Audio downloaded via yt-dlp               │
│  • Direct podcast stream links      • Audio processed for transcription         │
│  • Interactive transcript overlay   • AI analysis of downloaded audio           │
│  • AI chat powered by transcript    • Generated content stored locally          │
└─────────────────────────┬───────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│              DUAL-TRACK ARCHITECTURE: STREAM + PROCESS                          │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  TRACK 1: CONTENT DELIVERY TO USER (NO COPYING)                                 │
│  ┌─────────────────────────────────────────────────────────────────────────┐    │
│  │  • YouTube: Native embed player (YouTube serves content)                │    │
│  │  • Podcasts: Direct stream from RSS feed/MP3 URL                        │    │
│  │  • No local storage of original audio/video for user consumption        │    │
│  └─────────────────────────────────────────────────────────────────────────┘    │
│                                                                                 │
│  TRACK 2: BACKGROUND PROCESSING (COPYING FOR AI FEATURES)                       │
│  ┌─────────────────────────────────────────────────────────────────────────┐    │
│  │  ⚠️ COPYING OCCURS HERE:                                                │    │
│  │  • yt-dlp downloads audio for transcript generation                     │    │
│  │  • ffmpeg processes audio (mono 16kHz, chunking)                        │    │
│  │  • Gemini API transcribes downloaded audio                              │    │
│  │  • AI generates summaries, bios, chat context                           │    │
│  │  • Original downloaded audio deleted after processing                   │    │
│  └─────────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────┬───────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                     WHAT WE STORE LOCALLY                                       │
├─────────────────────────────────────────────────────────────────────────────────┤
│  • Transcripts (text derived from audio)                                        │
│  • AI-generated summaries (transformative content)                              │
│  • Speaker biographies (AI-generated)                                           │
│  • Chat system prompts (AI-generated)                                           │
│  • Metadata (episode info, timestamps)                                          │
│                                                                                 │
│  ❌ WE DO NOT STORE:                                                            │
│  • Original audio files (deleted after transcription)                           │
│  • Video files                                                                  │
│  • Podcast episodes for user playback                                           │
└─────────────────────────────────────────────────────────────────────────────────┘

Legal Evaluation Notes

Copyright-Sensitive Operations Summary

What Content We Copy:

Audio from YouTube videos and podcasts is temporarily downloaded using yt-dlp tool
Audio is processed through ffmpeg for format conversion and segmentation
Purpose: Generate AI transcripts, summaries, and interactive features
Duration: Original audio files are deleted after AI processing completes

What Content We Stream Only:

YouTube Videos: Users view content through YouTube's native embedded player - we never store video
Podcast Episodes: Users listen through direct streaming from RSS feeds or original MP3 URLs
User Experience: All audio/video consumption happens via original source streaming

What We Store Permanently:

Text transcripts (derived/transformative content from audio)
AI-generated summaries and speaker biographies
Metadata (episode titles, timestamps, speaker names)
Chat system prompts for AI interaction
High-quality 30-second speaker audio clips (standalone extraction tool output)

What We Do NOT Store:

Original audio files (deleted post-processing)
Video files of any kind
Podcast episodes for user playback

Detailed Content Flow for Legal Review

1. How Users Access Content

YouTube Video Experience:

User Interface: YouTube's native embedded player within our interface
Content Delivery: YouTube serves all video/audio content directly to user
Our Role: Provide interactive transcript overlay and AI chat features
No Copying for User: User watches/listens via YouTube's streaming infrastructure

Podcast Episode Experience:

User Interface: HTML5 audio player with direct RSS feed or MP3 URL
Content Delivery: Original podcast host serves audio content to user
Our Role: Provide interactive transcript, summaries, and AI chat features
No Copying for User: User listens via streaming from original podcast source

Interactive Features:

Real-time transcript highlighting synchronized with streaming audio
Click-to-seek navigation within transcript
AI voice chat based on transcript content
Speaker biography information displayed alongside content

2. Background Processing Architecture (Where Copying Occurs)

Audio Download Process:

Content URL → yt-dlp download → Temporary local audio file → AI processing → File deletion

Step-by-Step Copying Process:

Download Phase (⚠️ Copyright-sensitive):
- yt-dlp tool downloads audio from YouTube or podcast URLs
- Audio saved temporarily to local filesystem for processing
- Alternative HTTP download method for direct MP3 URLs
Processing Phase:
- ffmpeg converts audio to mono 16kHz format (optimal for AI transcription)
- Audio split into 20-minute chunks with 30-second overlap
- Processed audio files remain on local system during AI analysis
AI Analysis Phase:
- Google Gemini 2.5 Pro API receives processed audio for transcription
- Speaker identification and diarization performed
- Content summarization generated
- OpenAI and Claude APIs generate speaker biographies
Speaker Audio Extraction (⚠️ Additional copying for standalone tool):
- extract_speaker_audio.py creates high-quality 30-second speaker clips
- Uses Gemini 2.5 Pro for optimal speaking region identification
- Applies professional audio processing (44.1kHz mono WAV with fade processing)
- Generated for voice cloning applications and speaker analysis
Cleanup Phase:
- Original downloaded audio files deleted after transcript generation
- Only derived text content (transcripts, summaries) retained
- Speaker audio clips (30-second processed segments) may be retained
- No permanent storage of full source audio material

3. Data Storage and Retention

Generated Files Stored Long-term:

generated_podcasts/
├── {Episode_Title}_transcript.jsonl      # Text transcript with speaker/timestamp data
├── {Episode_Title}_metadata.json         # Episode metadata, speaker names, duration
├── {Episode_Title}_summary.txt           # AI-generated episode summary
├── {Episode_Title}_system_prompt.txt     # AI chat system context
└── {Episode_Title}_transcript_summary_and_toc.txt  # Table of contents

speaker_clips/                            # Standalone speaker extraction output
├── {Episode_Title}_{Speaker_Name}_30s.wav # High-quality 30-second speaker clips
└── {Episode_Title}_extraction_metadata.json # Extraction processing metadata

File Content Analysis:

Transcripts: Text representations of spoken content with speaker identification
Metadata: Factual information (titles, timestamps, speaker names)
Summaries: AI-generated transformative content describing episode themes
System Prompts: AI-generated context for chat functionality
Biographies: AI-researched and generated speaker background information
Speaker Audio Clips: 30-second processed audio segments optimized for voice applications

Retention Policy:

Generated text files stored indefinitely for user access
Original audio files deleted immediately after AI processing
No backup or archival of source audio material

4. External API Usage and Data Flow

Content Discovery APIs:

Podcast Index API: Search publicly available podcast metadata and RSS feeds
No Content Copying: Only metadata and RSS feed URLs obtained

AI Processing APIs:

Google Gemini 2.5 Pro: Receives processed audio for transcription services
OpenAI o3: Processes transcript text for speaker biography generation
Anthropic Claude: Processes and formats AI-generated content
ElevenLabs: Provides AI voice agents using transcript context

Data Transmission:

Processed audio sent to Gemini API for transcription (temporary processing)
Text transcripts sent to OpenAI/Claude for biography generation
No raw audio transmitted to biography generation APIs

5. Technical Implementation for Copyright Compliance

Separation of User Experience and Processing:

Users consume content via original streaming sources (YouTube, podcast hosts)
Background AI processing occurs independently from user content consumption
Generated features (transcripts, chat) enhance but don't replace original content

Temporary vs. Permanent Storage:

Audio copying is temporary and solely for AI feature generation
Permanent storage limited to derived/transformative text content
No mechanism for users to access downloaded audio files

Access Control:

Generated transcript files served only to users of our platform
No public distribution of derived content outside our application
User access tied to original content source availability

This architecture ensures that users access copyrighted content through original streaming sources while enabling AI-powered interactive features through temporary processing and permanent storage of derived text content only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Architecture Overview

Content Delivery vs. Background Processing Architecture

Legal Evaluation Notes

Copyright-Sensitive Operations Summary

Detailed Content Flow for Legal Review

1. How Users Access Content

2. Background Processing Architecture (Where Copying Occurs)

3. Data Storage and Retention

4. External API Usage and Data Flow

5. Technical Implementation for Copyright Compliance

FilesExpand file tree

SYSTEM_OVERVIEW.md

Latest commit

History

SYSTEM_OVERVIEW.md

File metadata and controls

System Architecture Overview

Content Delivery vs. Background Processing Architecture

Legal Evaluation Notes

Copyright-Sensitive Operations Summary

Detailed Content Flow for Legal Review

1. How Users Access Content

2. Background Processing Architecture (Where Copying Occurs)

3. Data Storage and Retention

4. External API Usage and Data Flow

5. Technical Implementation for Copyright Compliance