Repository: https://github.com/prateekjain24/TubeScribe
TubeScribe (CLI command: ytx) downloads YouTube audio, normalizes it, transcribes with your chosen engine, and writes clean transcript JSON + SRT captions. It includes smart caching, chapter processing, and optional summarization.
Resources
- Quickstart: README.md
- Engines: ytx/src/ytx/engines/ (whisper_engine.py, gemini_engine.py, openai_engine.py, deepgram_engine.py)
- CLI entry: ytx/src/ytx/cli.py (commands, flags, progress)
- Config guide: docs/CONFIG.md (env vars, timeouts, cache, engine options)
- API overview: docs/API.md (modules, models, extension points)
- Release checklist: docs/RELEASE.md
- Health check: ytx health
- End‑to‑end script: scripts/integration_e2e.sh
Quickstart (≈ 2 minutes)
- Prereqs
- Python 3.10+
- FFmpeg on PATH (check:
ffmpeg -version)
- Install
- Recommended (CLI):
pipx install tubescribe(orpip install tubescribe) - Verify:
ytx --versionortubescribe --version
Dev install (from source)
cd ytx && python3 -m venv .venv && source .venv/bin/activatepython -m pip install -U pip setuptools wheelpython -m pip install -e .- Or without installing: from repo root →
export PYTHONPATH="$(pwd)/ytx/src" && cd ytx && python3 -m ytx.cli --help
- Health check
ytx health(checks ffmpeg, API key presence, and basic network)
- Transcribe (local Whisper)
ytx transcribe "https://youtu.be/<VIDEOID>" --engine whisper --model small
Engines at a glance
- Whisper (local, faster‑whisper): fast on CPU/GPU; default.
- Whisper.cpp (Metal): on Apple Silicon; pass
--engine whispercpp --model /path/to/model.gguf. - Gemini (cloud): best‑effort timestamps; recommended
--timestamps chunked --fallback. - OpenAI (cloud):
--engine openai(SDK‑first optional; HTTP fallback). - Deepgram (cloud):
--engine deepgram(SDK‑first optional; HTTP fallback). - ElevenLabs (cloud): stub in place; STT support pending.
Common flags
--engine whisper|whispercpp|gemini|openai|deepgram— choose an engine--model <name>— engine model (e.g., small, large-v3-turbo, gemini-2.5-flash, whisper-1)--timestamps native|chunked|none— timestamp policy (chunked recommended for LLM engines)--engine-opts '{"k":v}'— provider options (e.g., Deepgram:{"utterances":true,"smart_format":true})--max-download-abr-kbps <N>— cap YouTube audio bitrate during download (default 96; set 0 to disable)--by-chapter --parallel-chapters --chapter-overlap 2.0— process chapters in parallel--summarize --summarize-chapters— overall TL;DR + bullets; per‑chapter summaries--output-dir ./artifacts— write outputs outside the cache dir--overwrite— ignore cache and reprocess--fallback— on Gemini errors, fallback to Whisper--debug— verbose logs
Examples
- Whisper (CPU):
ytx transcribe "https://youtu.be/<VIDEOID>" --engine whisper --model small
- Whisper (Metal via whisper.cpp):
ytx transcribe "https://youtu.be/<VIDEOID>" --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin
- Gemini (chunked timestamps + fallback):
ytx transcribe "https://youtu.be/<VIDEOID>" --engine gemini --timestamps chunked --fallback
- OpenAI (verbose segments when available):
ytx transcribe "https://youtu.be/<VIDEOID>" --engine openai --timestamps native
- Deepgram (utterances):
ytx transcribe "https://youtu.be/<VIDEOID>" --engine deepgram --engine-opts '{"utterances":true,"smart_format":true}' --timestamps native
- Chapters + summaries:
ytx transcribe "https://youtu.be/<VIDEOID>" --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize
- Summarize an existing transcript JSON:
ytx summarize-file /path/to/<video_id>.json --write
Configuration (copy .env.example → .env)
- Cloud keys:
OPENAI_API_KEY,DEEPGRAM_API_KEY,GEMINI_API_KEY(orGOOGLE_API_KEY) - Engine defaults:
YTX_ENGINE,WHISPER_MODEL - Engine options:
YTX_ENGINE_OPTS(JSON),YTX_PREFER_SDK=true(prefer SDK for OpenAI/Deepgram) - Timeouts:
YTX_NETWORK_TIMEOUT,YTX_DOWNLOAD_TIMEOUT,YTX_TRANSCRIBE_TIMEOUT,YTX_SUMMARIZE_TIMEOUT - Cache:
YTX_CACHE_DIR,YTX_CACHE_TTL_SECONDS|DAYS - whisper.cpp:
YTX_WHISPERCPP_BIN,YTX_WHISPERCPP_NGL,YTX_WHISPERCPP_THREADS
Outputs & cache
- JSON:
<video_id>.json(TranscriptDoc) — includes segments; optionallychaptersandsummary. - SRT:
<video_id>.srt— wrapped captions. - Cache layout (XDG):
~/.cache/ytx/<video_id>/<engine>/<model>/<config_hash>/transcript.json,captions.srt,meta.json(provenance),summary.json(if generated)
Apple Silicon (whisper.cpp)
- Build:
make -j METAL=1in whisper.cpp - Run:
ytx transcribe ... --engine whispercpp --model /path/to/model.gguf - Tuning:
YTX_WHISPERCPP_NGL(30–40 typical),YTX_WHISPERCPP_THREADS
Troubleshooting
- “ffmpeg not found”: install FFmpeg and ensure it’s on PATH (see Requirements).
- “Restricted / Private / Age‑restricted”: use cookies with yt‑dlp outside the tool to download audio locally, then run
ytx summarize-fileon the transcript. - “ytx: command not found”: ensure your Python scripts path is on PATH (e.g.,
~/.local/binor pipx bin dir). - “No module named ytx”: avoid running
ytxfrom inside theytx/folder; or use module form:PYTHONPATH=ytx/src python -m ytx.cli … - Gemini timestamps: best‑effort; prefer
--timestamps chunkedfor reliable coarse timings.
Health Reference
- ffmpeg: must be on PATH. macOS:
brew install ffmpeg; Ubuntu:sudo apt-get install -y ffmpeg. - whisper_engine: “available” if
faster-whisperimport works; if not, reinstall:pip install -U tubescribe. - whispercpp_bin: “configured” if
YTX_WHISPERCPP_BINpoints to a valid binary; “present” if amainwhisper.cpp binary exists on PATH; otherwise “absent”. Build whisper.cpp with Metal for Apple Silicon and set the env. - yt_dlp: check presence of
yt-dlpexecutable; install viapip install yt-dlporbrew install yt-dlp. - gemini_api_key: set
GEMINI_API_KEYorGOOGLE_API_KEYfor summaries. - openai_api_key: set
OPENAI_API_KEY(should start withsk-) for--engine openai. - deepgram_api_key: set
DEEPGRAM_API_KEYfor--engine deepgram. - network: basic internet reachability.
Useful commands
- Health:
ytx health - Update check:
ytx update-check - Cache:
ytx cache ls | ytx cache stats | ytx cache clear --yes
Export Markdown notes
- From cached transcript by video id:
ytx export --video-id <VIDEOID> --to md --output-dir ./notes --md-frontmatter
- From a TranscriptDoc JSON file:
ytx export --from-file /path/to/<video_id>.json --to md --output-dir ./notes --md-frontmatter
- Options:
--md-link-style short|long— youtu.be vs full URL--md-include-transcript— append full transcript section (off by default)--md-include-chapters/--no-md-include-chapters— include chapter outline (on by default)--md-auto-chapters-min N— if no chapters are present, synthesize outline every N minutes
Notes-ready Markdown contents
- Title linked to YouTube (short or full link style)
- Optional YAML frontmatter (Obsidian-friendly): title, url, date, duration, engine, model, tags
- Summary (TL;DR) and Key Points bullets (when present)
- Chapter outline with clickable timestamps (native or synthesized)
- Optional full transcript section with timestamped bullets
Example output (Markdown)
---
title: Sample Title
url: https://youtu.be/ABCDEFGHIJK
date: 2025-09-08
duration: 12:34
engine: gemini
model: gemini-2.5-flash
tags: [youtube, transcript]
---
# [Sample Title](https://youtu.be/ABCDEFGHIJK)
## Summary
One‑paragraph TL;DR here
## Key Points
- Point A
- Point B
## Chapters
### [0:00](https://youtu.be/ABCDEFGHIJK?t=0) Intro
### [5:23](https://youtu.be/ABCDEFGHIJK?t=323) Main Topic
Troubleshooting export
- “No cache found” when using
--video-id: upgrade totubescribe>=0.3.3and ensure both transcript and SRT exist in the branch. Legacy<video_id>.json/.srtare supported. - Always works: use
--from-file /path/to/<video_id>.jsonto export from a specific cached TranscriptDoc. - For long commands in zsh, use trailing
\per line to avoid “command not found”.
Contributing
- Code lives under
ytx/src/ytx/(CLI:cli.py). Tests underytx/tests/. - Run tests:
cd ytx && PYTHONPATH=src python -m pytest -q - Lint (if configured):
ruff check .