This is vibe coded and should only be run locally. It will be writing .MD files to your actual Obsidian vault. A good idea to simply test out summarization and meta data generation outside your Obsidian vault first. Use with caution
A Flask web application that automatically generates structured Markdown notes from YouTube videos and podcast episodes, saving them directly to your Obsidian vault. Uses local Ollama for AI-powered summarization and faster-whisper for podcast transcription.
Human Words: I built this app to combine my love for Obsidian note taking with my consumption of video and audio media. At first, I just wanted a script that would set up a note for me with relevant metadata. From there, I wanted functionality for the script to add in a summary of the media as well. From there, I wanted a web app where I could drop in a URL and tweak settings to my heart's content.
- Docker & Docker Compose installed on your system
- Ollama running on your host machine with a model pulled (e.g.,
ollama pull llama3.1:8b) - Obsidian vault location on your filesystem
-
Clone the repository:
-
Copy
.env.exampleto.envand configure.envfile:Open the
.envfile, then update these critical values:# Change this to a random string FLASK_SECRET_KEY=your-secret-key-here # Update if your Ollama uses a different port or isn't on localhost OLLAMA_BASE_URL=http://host.docker.internal:11434 # Change to your preferred model OLLAMA_MODEL=llama3.1:8b # Set default folder to save note to. Your actual vault path is set in docker-compose file OBSIDIAN_FOLDER=youtube
-
Configure
docker-compose.yml:REQUIRED: Update the Obsidian vault path in both
webandworkerservices:services: web: volumes: - type: bind source: "/absolute/path/to/your/vault" # <- CHANGE THIS target: /vault worker: volumes: - type: bind source: "/absolute/path/to/your/vault" # <- CHANGE THIS target: /vault
Replace the
sourcepath with your actual Obsidian vault path.Optional: Adjust worker resource limits if processing long podcasts (4+ hours):
worker: deploy: resources: limits: memory: 32G # Reduce to 16G for shorter content cpus: '8' # Reduce to 4 for shorter content
-
Start the application:
docker-compose up -d
-
Verify Ollama is accessible:
# From your host machine curl http://localhost:11434/api/tagsYou should see a list of your pulled models.
-
Access the web interface:
http://localhost:5050 -
Test with a YouTube video:
- Paste a YouTube URL
- Keep default settings
- Click "Submit"
- Follow the job status link to watch progress
- Find your note in
{OBSIDIAN_VAULT}/{OBSIDIAN_FOLDER}/
- Can't connect to Ollama: Ensure
OLLAMA_BASE_URLuseshttp://host.docker.internal:11434on macOS/Windows, or checkextra_hostsconfiguration on Linux - Notes not appearing: Verify the volume mount paths in
docker-compose.ymlmatch your Obsidian vault location - Podcast transcription fails: Set
PODCAST_ASR_ENABLE=1in.envand ensure adequate RAM (16GB+) is available
- Context length (tokens) relates to the model you are feeding your transcript to. If your model supports large context lengths, be sure to set your context window higher here to take advantage of that. Larger context lengths will allow you to process longer transcripts without segmenting. I've found a context window of 32,000 tokens to be more than adequate for an hour of transcription.
- Segment length comes in to play when you have to (or want to) summarize individual sections of a transcript because it's too long for the context you can support. The sections are summarized individually and written in to you note. The individual sections are NOT merged together in a larger summary
Disclaimer: This was mostly vibe coded - including the README
This application makes a few assumptions:
- You are already self hosting an LLM and are somewhat familiar with interacting with it.
- You are familiar with how Docker works and are used to modifying
.envfiles and usingdocker compose
Vibe Coded README
This application provides a web interface to:
- YouTube Videos: Extract transcripts (via YouTube API or yt-dlp subtitles) and generate structured summaries with key points, timestamps, and quotes
- Podcast Episodes: Download audio from podcast URLs (Apple Podcasts, Spotify, RSS feeds, etc.), transcribe using faster-whisper ASR, and create detailed summaries
- Ollama Integration: Use local LLM models for intelligent summarization with customizable prompts
- Obsidian Export: Save notes with YAML frontmatter directly to your Obsidian vault
- Background Processing: Redis Queue (RQ) handles long-running transcription and summarization tasks
- Customizable Prompts: Configure system, summary, and segment prompts via web interface or environment variables
- Advanced Options: Cookie support for age-gated content, proxy support, rate limiting, and per-segment summaries for long content
- Transcript Sources: YouTube Transcript API, yt-dlp subtitles, or faster-whisper ASR for podcasts
- Flexible Summarization: Full transcript summaries or per-segment summaries for long content (1+ hour)
- Rate Limiting: Download rate limiting (1MB/s default) to prevent bandwidth issues
- Progress Tracking: Real-time job status updates with detailed logging
- Cookie & Proxy Support: Access age-gated or geo-restricted content
- Docker Deployment: Containerized with docker-compose for easy setup
The application is configured entirely through environment variables. Copy the .env.example file to .env and customize as needed. Most of these can be overridden in the web interface as well.
# Flask configuration
FLASK_SECRET_KEY=a-very-secret-key # Change this to a random string
PORT=5050 # Web interface port (default: 5050)REDIS_URL=redis://redis:6379/0 # Redis connection URL
RQ_QUEUE=yt # Queue name for background jobs
RQ_JOB_TIMEOUT=3600 # Job timeout in seconds (1 hour)
RQ_RESULT_TTL=86400 # Keep successful results for 24 hours
RQ_FAILURE_TTL=604800 # Keep failed results for 7 daysOBSIDIAN_VAULT=/vault # Mount point inside container (don't change)
OBSIDIAN_FOLDER=youtube # Subfolder within vault for notes
YT_LANGS=en,en-US,en-GB # Preferred transcript languagesImportant: The container writes to /vault internally. Use docker-compose volume mapping to point this to your actual Obsidian vault path on the host.
OLLAMA_BASE_URL=http://host.docker.internal:11434 # Ollama server URL
OLLAMA_MODEL=gpt-oss:120b # Model to use for summarization
OLLAMA_CONTEXT_LENGTH=32000 # Context window size (default: 32000)Note:
- On macOS/Windows: Use
http://host.docker.internal:11434to access Ollama running on the host - On Linux: The docker-compose file includes
extra_hostsmapping to supporthost.docker.internal
YTDLP_COOKIES=/vault/cookies.txt # Netscape-format cookies file for age-gated content
YTDLP_PROXY_FILE=/vault/proxies.txt # Text file with one proxy per line (format: http://host:port)Cookies: Export browser cookies in Netscape format and place in your vault. Useful for age-restricted or member-only content. If you've used yt-dlp previously, this will be familiar.
Proxies: Create a text file with one proxy per line. The app will randomly select proxies for each request. Format: http://proxy.example.com:8080
PODCAST_ASR_ENABLE=1 # Enable ASR transcription (required for podcasts)
PODCAST_ASR_MODEL=tiny.en # Whisper model: tiny, base, small, medium, large-v3
PODCAST_ASR_DEVICE=cpu # Device: cpu or cuda (requires GPU)
PODCAST_ASR_COMPUTE=int8 # Compute type: int8, float16, float32
PODCAST_ASR_BATCH_SIZE=1 # Batch size (1 for CPU, 8-24 for GPU)
PODCAST_ASR_BEAM_SIZE=3 # Beam search size (higher=more accurate but slower)
ASR_CHUNK_DURATION=3600 # Chunk duration in seconds (1 hour default)Model Selection:
tiny/tiny.en: Fastest, lowest accuracy, ~1GB RAMbase/base.en: Good balance, ~1GB RAMsmall/small.en: Better accuracy, ~2GB RAMmedium/medium.en: High accuracy, ~5GB RAMlarge-v3: Best accuracy, ~10GB RAM
GPU Configuration Examples:
# GPU with 8GB VRAM
PODCAST_ASR_DEVICE=cuda
PODCAST_ASR_COMPUTE=float16
PODCAST_ASR_BATCH_SIZE=16
# GPU with 16GB+ VRAM
PODCAST_ASR_DEVICE=cuda
PODCAST_ASR_COMPUTE=float16
PODCAST_ASR_BATCH_SIZE=24All prompts support placeholders: {title}, {url}, {transcript}, {show}, {start_hms}, {end_hms}
# System prompt prepended to all summarization requests
OLLAMA_SYSTEM_PROMPT="You are a precise note-taker creating concise, accurate summaries for Obsidian..."
# YouTube full transcript summary
YOUTUBE_SUMMARY_PROMPT="You will summarize a YouTube transcript.\nTitle: {title}..."
# YouTube per-hour segment summary
YOUTUBE_SEGMENT_PROMPT="Summarize this portion of a YouTube stream/video..."
# Podcast full transcript summary
PODCAST_SUMMARY_PROMPT="You will summarize a podcast transcript.\nShow: {show}..."
# Podcast per-segment summary
PODCAST_SEGMENT_PROMPT="Summarize this portion of a podcast episode..."Customization: These defaults can be overridden in the web interface on a per-request basis. The web form provides three text areas for system, summary, and segment prompts.
The docker-compose.yml file uses volume mappings to connect the containerized application with your host system.
Obsidian Vault Mount (/vault):
- Purpose: Maps your host Obsidian vault directory to
/vaultinside containers - Used by: Both
webandworkercontainers - Configuration: Update
sourceto match your Obsidian vault path - Example:
source: "/absolute/path/to/obsidian/vault" # macOS source: "/absolute/path/to/obsidian/vault" # Linux source: "C:/absolute/path/to/obsidian/vault" # Windows
- Why: Allows the app to write Markdown notes directly to your vault and read cookie/proxy files
Model Cache (models-cache):
- Purpose: Persists downloaded faster-whisper and Hugging Face model files
- Mounted at:
/root/.cachein worker container - Why:
- Prevents re-downloading multi-GB models on container restart
- Speeds up subsequent podcast transcriptions
- Survives container removal
- Size: Can grow to 10GB+ depending on Whisper model size
Redis Data (redis-data):
- Purpose: Persists Redis database (job queue state, results)
- Mounted at:
/datain redis container - Why:
- Preserves job history across container restarts
- Maintains result cache (24 hours default)
- Prevents data loss on container recreation
deploy:
resources:
limits:
memory: 32G # Maximum memory (for very long transcriptions 4+ hours)
cpus: '8' # Maximum CPU cores
reservations:
memory: 16G # Guaranteed memory
cpus: '4' # Guaranteed coresAdjustment Guidelines:
- Podcast Length < 1 hour: 8GB memory, 2 CPU cores sufficient
- Podcast Length 1-2 hours: 16GB memory, 4 CPU cores recommended
- Podcast Length 2-4 hours: 24GB memory, 6 CPU cores recommended
- Podcast Length 4+ hours: 32GB memory, 8 CPU cores for best performance
GPU Usage: CUDA is not currently functioning for faster-whisper.
- Docker & Docker Compose: Latest version
- Ollama: Running on host machine with desired model pulled
- Disk Space: 10GB+ for model cache (if using podcast transcription)
- RAM: 16GB+ recommended for podcast transcription (see resource limits above)
MIT

