A powerful FastAPI application that streams audio from YouTube videos as MP3 over HTTP, with automatic transcription, AI-powered summaries, and intelligent knowledge management.
- YouTube Audio Streaming: Stream any YouTube video as MP3 audio; supports full YouTube URLs, short
youtu.belinks, and raw video IDs - Smart Queue System: Build playlists with a persistent queue (survives page refreshes) that supports both YouTube videos and weekly summary audio as mixed items
- Auto-Play & Prefetching: Automatically plays the next track when the current one finishes; pre-downloads the upcoming track in the background for seamless transitions
- Play History: Tracks all played videos with play counts, relative timestamps, and rich metadata (title, channel, thumbnail); click any entry to replay
- Stream Resilience: Automatic stalling detection with up to 50 reconnection retries and adaptive back-off delay
- Audio Player: HTML5 player with Play/Pause, Rewind (−15 s), Fast-Forward (+15 s), and Stop
- Speed Controls: Switch between 1×, 1.15×, 1.3×, and 1.5× playback speed
- Keyboard Shortcuts: Space to play/pause; ↑/↓ to skip forward/back 15 s
- MediaSession API: Rich media controls in car entertainment systems (Tesla tested), lock screens, and notification panels — displays title, channel, and album art
- Mobile-First Design: Responsive layout optimised for phones, tablets, and desktops
- Dark / Light Theme: Automatic system-preference detection with manual toggle; preference saved in local storage
- Drag-and-Drop Queue Reordering: Reorder queue items by dragging (mouse and touch); order saved instantly to the database
- Multi-Provider: OpenAI Whisper, Mistral Voxtral, or Google Gemini — switch via config
- Audio Optimisation: Automatic compression (1.5× speed, mono, 32 kbps, 16 kHz) reduces Whisper API costs ~33 %
- Background Processing: Runs asynchronously; transcription status visible in UI with polling every 5 s
- Smart Caching: Transcripts cached locally; Trilium deduplication prevents re-processing the same video
- Video Summaries: AI-generated summaries posted to Trilium Notes with video title, channel, thumbnail, and YouTube link
- Multi-Provider: OpenAI GPT or Google Gemini (Gemini recommended for free tier)
- Automated Scheduling: Configurable schedule (default: Fridays at 11 PM) to summarise the week's viewing
- Comprehensive Analysis: Synthesises all videos watched during the week; extracts key learnings and common themes
- Text-to-Speech: Optional TTS (OpenAI or ElevenLabs) converts the written summary to audio you can queue and play
- AI Content Discovery: Analyses viewing history and Trilium summaries to find thematically related YouTube videos
- Configurable: Control how many recent videos to analyse and how many suggestions to generate
- LLM Usage Dashboard: HTML dashboard at
/admin/statsplus JSON API with filtering by date range, provider, model, and feature - Cost Monitoring: Token counts and audio-duration tracking for accurate per-minute pricing (Whisper, Voxtral)
- Client-Side Logging: Browser logs batched and forwarded to the server — essential for debugging on car displays and mobile devices without a console
- Modern Stack: FastAPI, Python 3.12, SQLite with type-safe models and dataclasses
- Comprehensive Testing: 76 %+ test coverage with pytest and pre-commit hooks
- CI/CD: Automated testing with GitHub Actions on every push and pull request
- Type Safety: Full return-type annotations; mypy-compatible throughout
- Operating System: Linux (Ubuntu/Debian recommended)
- Python: 3.12 or higher
- uv: Fast Python package manager (installed automatically by setup script)
git clone https://github.com/MrDesjardins/audio-stream-server.git
cd audio-stream-serverThe automated setup script installs all dependencies and initializes the database:
chmod +x setup.sh
./setup.shThis script will:
- Install system dependencies (yt-dlp, ffmpeg)
- Install uv package manager if not present
- Install Python dependencies
- Initialize the SQLite database
- Create a
.envfile from.env.example
Edit the .env file to customize your settings:
nano .envMinimum configuration (no AI features):
# Server configuration
ENV=production
FASTAPI_HOST=0.0.0.0 # Listen on all interfaces (accessible from network)
FASTAPI_API_PORT=8000
# Disable AI features for basic streaming
TRANSCRIPTION_ENABLED=falseFull configuration (with AI features):
# Server configuration
ENV=production
FASTAPI_HOST=0.0.0.0
FASTAPI_API_PORT=8000
# Enable AI features
TRANSCRIPTION_ENABLED=true
# AI API Keys
OPENAI_API_KEY=sk-... # Get from https://platform.openai.com/api-keys
GEMINI_API_KEY=... # Get from https://makersuite.google.com/app/apikey
# Provider selection (recommended: Whisper + Gemini for best cost/quality)
TRANSCRIPTION_PROVIDER=openai # "openai" (Whisper), "mistral" (Voxtral), or "gemini"
SUMMARY_PROVIDER=gemini # "gemini" (free tier) or "openai"
# Trilium Notes integration (for saving summaries)
TRILIUM_URL=http://localhost:8080
TRILIUM_ETAPI_TOKEN=... # From Trilium: Options → ETAPI
TRILIUM_PARENT_NOTE_ID=... # Right-click note → "Copy Note ID"
# Optional features
WEEKLY_SUMMARY_ENABLED=false
BOOK_SUGGESTIONS_ENABLED=false
TTS_ENABLED=falseImportant settings explained:
-
FASTAPI_HOST:
0.0.0.0= Accessible from all network devices (recommended)127.0.0.1= Localhost only (not accessible from other devices)- Or use your specific local IP (e.g.,
10.0.0.181)
-
TRANSCRIPTION_PROVIDER:
openai= Whisper API ($0.006/minute, very accurate, fast, 25MB limit)mistral= Voxtral Mini ($0.003/minute, cost-effective, good quality, 15 min limit)gemini= Gemini 1.5 Flash (free tier available, good quality, no limits)
-
SUMMARY_PROVIDER:
gemini= Gemini 2.5 Flash (recommended, free tier, fast)openai= GPT-4o-mini (high quality, paid)
If you enabled transcription with Trilium integration:
uv run test_trilium.pyThis verifies:
- Trilium is reachable at your configured URL
- ETAPI token is valid
- Parent note ID exists and is accessible
If running on a server, allow access to port 8000:
sudo ufw allow 8000/tcp
sudo ufw reload
sudo ufw statusDevelopment mode (with auto-reload):
uv run main.pyProduction mode (as systemd service):
See Running as a Service section below.
Open your browser and navigate to:
http://localhost:8000 # If running locally
http://YOUR_SERVER_IP:8000 # If running on a server
You should see the web interface with:
- Search bar to enter YouTube URLs or video IDs
- Play history
- Queue management
- Transcription status (if enabled)
Required for Whisper transcription or GPT summarization.
- Visit https://platform.openai.com/api-keys
- Sign in or create an account
- Click "Create new secret key"
- Copy the key (starts with
sk-) - Add to
.envfile:OPENAI_API_KEY=sk-...
Cost: Whisper is $0.006 per minute of audio. For typical use (~30 hours/month), expect ~$10-15/month.
Limitation: Maximum 25MB file size. Audio is automatically compressed (1.5x speed, mono, 32kbps) to save costs and meet this limit. For very long videos (>2 hours), use Gemini instead.
Required for Gemini transcription or summarization. Has a generous free tier.
- Visit https://makersuite.google.com/app/apikey
- Sign in with your Google account
- Click "Create API Key"
- Copy the key
- Add to
.envfile:GEMINI_API_KEY=...
Free Tier:
- 15 requests per minute
- 1 million tokens per day
- 1,500 requests per day
For typical use, summarization and weekly summaries are essentially free.
Required for Mistral Voxtral transcription. Cost-effective option at $0.003/minute.
- Visit https://console.mistral.ai/api-keys
- Sign in or create an account
- Click "Create new key"
- Copy the key
- Add to
.envfile:MISTRAL_API_KEY=...
Cost: Voxtral Mini is $0.003 per minute of audio. For typical use (~30 hours/month), expect ~$5-8/month (50% cheaper than Whisper).
Limitation: Maximum 30 minutes per audio file. For longer videos, use Gemini (no limit) or split the audio.
Required for saving transcripts and summaries to Trilium Notes.
- Open Trilium Notes in your browser
- Go to Options → ETAPI
- Click "Create new token" or copy an existing one
- Copy the token
- Add to
.envfile:TRILIUM_ETAPI_TOKEN=...
Get Parent Note ID:
- In Trilium, navigate to or create a note where you want summaries stored
- Right-click the note → "Copy Note ID"
- Add to
.envfile:TRILIUM_PARENT_NOTE_ID=...
Required only if you want text-to-speech for weekly summaries. Choose one provider:
Most affordable for long-form content
- Pricing: $15 per 1M characters (~$0.15 for a 10K character summary)
- Quality: 6 natural voices (alloy, echo, fable, onyx, nova, shimmer)
- Models:
tts-1(standard) ortts-1-hd(higher quality) - You already have the API key from transcription setup
Set in .env:
TTS_PROVIDER=openai
OPENAI_TTS_VOICE=alloy
OPENAI_TTS_MODEL=tts-1Higher quality voices, more expensive
- Visit https://elevenlabs.io/
- Sign up or sign in
- Go to your profile → API Keys
- Copy your API key
- Add to
.envfile:ELEVENLABS_API_KEY=...
Free Tier: 10,000 characters per month (~7-10 summaries)
Set in .env:
TTS_PROVIDER=elevenlabs
ELEVENLABS_VOICE_ID=pNInz6obpgDQGcFmaJgBAll configuration is done via the .env file. See .env.example for a complete reference with descriptions.
Core Settings:
| Variable | Default | Description |
|---|---|---|
ENV |
production |
Environment mode (development or production) |
FASTAPI_HOST |
0.0.0.0 |
IP address to bind to |
FASTAPI_API_PORT |
8000 |
Port to listen on |
DATABASE_PATH |
./audio_history.db |
SQLite database location |
Audio Settings:
| Variable | Default | Description |
|---|---|---|
AUDIO_QUALITY |
4 |
MP3 quality (0=best, 9=smallest, 4=~128kbps) |
AUDIO_CACHE_MAX_FILES |
10 |
Number of audio files to keep cached |
PREFETCH_THRESHOLD_SECONDS |
30 |
When to start downloading next track |
TEMP_AUDIO_DIR |
/tmp/audio-transcriptions |
Where to store audio files |
Transcription Settings:
| Variable | Default | Description |
|---|---|---|
TRANSCRIPTION_ENABLED |
false |
Enable AI transcription features |
TRANSCRIPTION_PROVIDER |
openai |
Provider: openai (Whisper), mistral (Voxtral), or gemini |
TRANSCRIPTION_MODEL |
(auto) | Model override (optional) |
SUMMARY_PROVIDER |
gemini |
Provider for video summaries |
SUMMARY_MODEL |
(auto) | Model override (optional) |
Trilium Integration:
| Variable | Required When | Description |
|---|---|---|
TRILIUM_URL |
Transcription enabled | Trilium instance URL |
TRILIUM_ETAPI_TOKEN |
Transcription enabled | ETAPI authentication token |
TRILIUM_PARENT_NOTE_ID |
Transcription enabled | Note ID where summaries are stored |
Weekly Summaries:
| Variable | Default | Description |
|---|---|---|
WEEKLY_SUMMARY_ENABLED |
false |
Enable automated weekly summaries |
WEEKLY_SUMMARY_PROVIDER |
gemini |
AI provider for weekly summaries |
WEEKLY_SUMMARY_MODEL |
(auto) | Model override (optional) |
Smart Suggestions:
| Variable | Default | Description |
|---|---|---|
BOOK_SUGGESTIONS_ENABLED |
false |
Enable AI video suggestions |
BOOKS_TO_ANALYZE |
10 |
How many recent videos to analyze |
SUGGESTIONS_COUNT |
4 |
Number of suggestions to generate |
SUGGESTIONS_AI_PROVIDER |
gemini |
AI provider for suggestions |
Text-to-Speech:
| Variable | Default | Description |
|---|---|---|
TTS_ENABLED |
false |
Enable TTS for summaries |
TTS_PROVIDER |
openai |
Provider: openai or elevenlabs |
OPENAI_TTS_VOICE |
alloy |
OpenAI voice (alloy, echo, fable, onyx, nova, shimmer) |
OPENAI_TTS_MODEL |
tts-1 |
OpenAI model (tts-1 or tts-1-hd) |
ELEVENLABS_API_KEY |
- | ElevenLabs API key (if using ElevenLabs) |
ELEVENLABS_VOICE_ID |
pNInz6obpgDQGcFmaJgB |
ElevenLabs voice ID (Adam by default) |
ELEVENLABS_MODEL_ID |
eleven_flash_v2_5 |
ElevenLabs model |
WEEKLY_SUMMARY_AUDIO_DIR |
/var/audio-summaries |
Where to store TTS audio files |
Start streaming a YouTube video:
curl -X POST http://localhost:8000/stream \
-H "Content-Type: application/json" \
-d '{"youtube_video_id": "dQw4w9WgXcQ"}'
# Or with a full YouTube URL:
curl -X POST http://localhost:8000/stream \
-H "Content-Type: application/json" \
-d '{"youtube_video_id": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'
# Skip transcription for this video:
curl -X POST http://localhost:8000/stream \
-H "Content-Type: application/json" \
-d '{"youtube_video_id": "dQw4w9WgXcQ", "skip_transcription": true}'Get the audio stream:
# Open in browser or media player:
http://localhost:8000/mystreamCheck stream status:
curl http://localhost:8000/statusStop current stream:
curl -X POST http://localhost:8000/stopAdd video to queue:
curl -X POST http://localhost:8000/queue/add \
-H "Content-Type: application/json" \
-d '{"youtube_video_id": "dQw4w9WgXcQ"}'Get current queue:
curl http://localhost:8000/queueSkip to next track:
curl -X POST http://localhost:8000/queue/nextRemove specific item from queue:
curl -X DELETE http://localhost:8000/queue/123Clear entire queue:
curl -X POST http://localhost:8000/queue/clearGet play history:
curl http://localhost:8000/historyClear play history:
curl -X POST http://localhost:8000/history/clearGet transcription status:
curl http://localhost:8000/transcription/status/dQw4w9WgXcQManually trigger transcription:
curl -X POST http://localhost:8000/transcription/start/dQw4w9WgXcQGet summary and Trilium link:
curl http://localhost:8000/transcription/summary/dQw4w9WgXcQGet detailed usage statistics:
# Recent usage (last 100 records)
curl "http://localhost:8000/admin/llm-usage/stats?limit=100"
# Filter by provider
curl "http://localhost:8000/admin/llm-usage/stats?provider=openai"
curl "http://localhost:8000/admin/llm-usage/stats?provider=gemini"
# Filter by model
curl "http://localhost:8000/admin/llm-usage/stats?model=whisper-1"
curl "http://localhost:8000/admin/llm-usage/stats?model=gpt-4o-mini"
# Filter by feature
curl "http://localhost:8000/admin/llm-usage/stats?feature=transcription"
curl "http://localhost:8000/admin/llm-usage/stats?feature=summarization"
curl "http://localhost:8000/admin/llm-usage/stats?feature=weekly_summary"
# Date range filter (ISO 8601 format)
curl "http://localhost:8000/admin/llm-usage/stats?start_date=2026-02-01T00:00:00&end_date=2026-02-03T23:59:59"
# Combine filters
curl "http://localhost:8000/admin/llm-usage/stats?provider=openai&feature=transcription&limit=50"Get aggregated summary:
# Overall summary (all time)
curl "http://localhost:8000/admin/llm-usage/summary"
# Summary for specific date range
curl "http://localhost:8000/admin/llm-usage/summary?start_date=2026-02-01T00:00:00&end_date=2026-02-28T23:59:59"Response format:
{
"status": "success",
"summary": {
"totals": {
"call_count": 150,
"total_prompt_tokens": 125000,
"total_response_tokens": 45000,
"total_tokens": 170000
},
"by_provider_model_feature": [
{
"provider": "openai",
"model": "whisper-1",
"feature": "transcription",
"call_count": 50,
"total_tokens": 50000
}
]
}
}Trigger weekly summary manually:
curl -X POST "http://localhost:8000/admin/weekly-summary/trigger"or for a specific week:
curl -X POST "http://localhost:8000/admin/weekly-summary/trigger" \
-H "Content-Type: application/json" \
-d '{"date": "2026-02-06"}'Get next scheduled run time:
curl "http://localhost:8000/admin/weekly-summary/next-run"| Model | Input | Output | Notes |
|---|---|---|---|
| OpenAI | |||
| gpt-4o-mini | $0.15 | $0.60 | Reliable workhorse (recommended) |
| gpt-4o | $2.50 | $10.00 | Higher quality |
| whisper-1 | $0.006/min | - | Audio transcription, 25MB limit |
| tts-1 | $15.00 | - | Text-to-speech (per 1M chars) |
| tts-1-hd | $30.00 | - | TTS HD quality (per 1M chars) |
| Mistral AI | |||
| voxtral-mini-latest | $0.003/min | - | Audio transcription, 15 min limit |
| Google Gemini | |||
| gemini-2.5-flash | $0.15 | $0.60 | Fast, comparable to gpt-4o-mini (recommended) |
| gemini-2.5-flash-preview-tts | $0.40 | $0.40 | Audio transcription (per 1M tokens) |
| gemini-1.5-flash | $0.10 | $0.40 | Slightly older, still excellent |
| gemini-1.5-pro | $1.25 | $5.00 | Higher quality |
| ElevenLabs | |||
| eleven_flash_v2_5 | $100.00 | - | TTS (per 1M chars), ~$0.10 per 1K |
| eleven_turbo_v2_5 | $300.00 | - | TTS higher quality (per 1M chars) |
Using recommended configuration (Whisper + Gemini 2.5 Flash):
- Video transcription (Whisper): $0.006 per minute of audio
- 10 min video = $0.06
- 1 hour video = $0.36
Alternative: Cost-optimized (Voxtral + Gemini 2.5 Flash):
-
Video transcription (Voxtral Mini): $0.003 per minute of audio (50% cheaper)
- 10 min video = $0.03
- 1 hour video = $0.18
-
Video summarization (Gemini 2.5 Flash): ~$0.0003-0.001 per summary
- Typical: 2,000 input tokens + 500 output tokens
- Cost: (2,000 × $0.15 + 500 × $0.60) / 1,000,000 = $0.0006
-
Weekly summary (Gemini 2.5 Flash): ~$0.003-0.01 per summary
- Typical: 10,000 input tokens + 2,000 output tokens
- Cost: (10,000 × $0.15 + 2,000 × $0.60) / 1,000,000 = $0.0027
-
Book suggestions (Gemini 2.5 Flash): ~$0.0002-0.0005 per request
- Typical: 1,000 input tokens + 100 output tokens
- Cost: (1,000 × $0.15 + 100 × $0.60) / 1,000,000 = $0.0002
Light usage (10 hours/month, 15 videos):
- Transcription: 10 hours × 60 min × $0.006 = $3.60
- Summarization: 15 videos × $0.0006 = $0.01
- Weekly summaries: 4 weeks × $0.0027 = $0.01
- Total: ~$3.62/month
Moderate usage (30 hours/month, 45 videos):
- Transcription: 30 hours × 60 min × $0.006 = $10.80
- Summarization: 45 videos × $0.0006 = $0.03
- Weekly summaries: 4 weeks × $0.0027 = $0.01
- Total: ~$10.84/month
Heavy usage (100 hours/month, 150 videos):
- Transcription: 100 hours × 60 min × $0.006 = $36.00
- Summarization: 150 videos × $0.0006 = $0.09
- Weekly summaries: 4 weeks × $0.0027 = $0.01
- Total: ~$36.10/month
Gemini has a generous free tier that covers most summarization needs:
- 15 requests per minute
- 1 million tokens per day
- 1,500 requests per day
What's free:
- Video summarization (essentially unlimited for personal use)
- Weekly summaries (4 per month)
- Smart suggestions (as much as you need)
What costs money:
- Transcription with Whisper (no free option for high quality)
Use the admin endpoints to monitor your actual usage:
# Get total tokens used this month
curl "http://localhost:8000/admin/llm-usage/summary?start_date=2026-02-01T00:00:00"
# Check Whisper usage
curl "http://localhost:8000/admin/llm-usage/stats?model=whisper-1&limit=1000"Calculate costs based on current provider pricing:
- OpenAI Whisper: audio duration minutes × $0.006
- Mistral Voxtral: audio duration minutes × $0.003
- GPT-4o-mini: (prompt_tokens × $0.15 + response_tokens × $0.60) / 1,000,000
- Gemini: Usually free up to limits
The application can run as a systemd service for automatic startup and management.
1. Edit the service file:
nano audio-stream.serviceUpdate these lines with your actual username and paths:
User=YOUR_USERNAME
WorkingDirectory=/home/YOUR_USERNAME/audio-stream-server2. Install the service:
sudo cp audio-stream.service /etc/systemd/system/audio-stream.service
sudo systemctl daemon-reload3. Enable and start:
# Enable (start on boot)
sudo systemctl enable audio-stream
# Start now
sudo systemctl start audio-stream
# Check status
sudo systemctl status audio-stream4. Manage the service:
# Restart
sudo systemctl restart audio-stream
# Stop
sudo systemctl stop audio-stream
# View logs
journalctl -u audio-stream -n 1000 -fNote: The service automatically loads your .env file from the WorkingDirectory.
Use the provided update script to safely update:
./update.shThis script:
- Checks if service is running
- Pulls latest changes from git
- Updates Python dependencies
- Checks and installs missing system dependencies
- Runs database migrations
- Restarts the service if it was running
# Clone repository
git clone https://github.com/MrDesjardins/audio-stream-server.git
cd audio-stream-server
# Run setup
./setup.sh
# Install development dependencies
uv sync --extra dev --extra test
# Install pre-commit hooks
uv run pre-commit install# Run all tests with coverage
uv run pytest
# Run without coverage (faster)
uv run pytest --no-cov
# Run specific test file
uv run pytest tests/services/test_database.py
# Run specific test
uv run pytest tests/services/test_database.py::TestDatabase::test_add_history
# Use the test runner script
./run_tests.sh all # All tests with coverage
./run_tests.sh fast # Fast mode (no coverage)
./run_tests.sh services # Only service tests
./run_tests.sh routes # Only route testsSee TESTING.md for comprehensive testing documentation.
Pre-commit hooks (automatic on commit):
# One-time setup
uv run pre-commit install
# Hooks run automatically on git commit
# They auto-fix issues and add fixes to your commitManual linting:
# Lint and auto-fix with Ruff
uv run ruff check --fix
# Format code
uv run ruff format .
# Type checking
uv run mypy .
# Run all pre-commit hooks manually
uv run pre-commit run --all-filesSee LINTING.md for detailed linting documentation.
audio-stream-server/
├── main.py # FastAPI app initialization
├── config.py # Configuration management
├── routes/ # API route handlers
│ ├── stream.py # Streaming and playback
│ ├── queue.py # Queue management
│ ├── transcription.py # Transcription endpoints
│ └── admin.py # Admin endpoints
├── services/ # Core business logic
│ ├── streaming.py # yt-dlp and ffmpeg pipeline
│ ├── broadcast.py # Multi-client streaming
│ ├── database.py # SQLite operations
│ ├── youtube.py # YouTube metadata
│ ├── transcription.py # OpenAI/Gemini transcription
│ ├── summarization.py # AI summarization
│ ├── trilium.py # Trilium Notes integration
│ ├── background_tasks.py # Async processing
│ ├── llm_clients.py # AI client wrappers
│ └── cache.py # Audio and transcript caching
├── templates/
│ └── index.html # Jinja2 web interface
├── static/
│ ├── style.css # Responsive dark theme
│ └── fonts/ # Self-hosted fonts
└── tests/ # Comprehensive test suite
├── services/
└── routes/
Database schema updates are handled by migration scripts:
# Run all migrations (done automatically by update.sh)
uv run python migrate_database.py # Base schema
uv run python migrate_add_metadata.py # Channel and thumbnail fields
uv run python migrate_add_queue_columns.py # Queue type and week_year
uv run python migrate_add_llm_stats.py # LLM usage tracking table
uv run python migrate_add_audio_duration.py # Audio duration for cost tracking
uv run python migrate_add_weekly_summary.py # Weekly summaries tableEach migration:
- Creates a backup before making changes
- Is idempotent (safe to run multiple times)
- Preserves all existing data
- Runs automatically during
./update.sh
See CONTRIBUTING.md for development guidelines, including:
- Code style requirements
- Type annotation standards
- Path handling with
expand_path() - Testing requirements (76% minimum coverage)
- Pre-commit hook usage
GitHub Actions automatically:
- Runs tests on every push and PR
- Generates coverage reports
- Runs code quality checks (Ruff, mypy)
- Posts coverage to PR comments
See CI_SETUP.md for CI configuration details.
- Input: Client sends YouTube video ID via
/streamendpoint - Extract:
yt-dlpextracts best audio from YouTube → stdout - Convert:
ffmpegconverts audio to MP3 → stdout - Broadcast:
StreamBroadcasterreads from ffmpeg and broadcasts to all connected clients - Multi-client: Multiple clients can stream simultaneously via
/mystreamendpoint - Playback: HTML5 audio player consumes the stream
- StreamBroadcaster: Manages concurrent client connections with replay buffers
- Replay Buffer: Last 100 chunks (~800KB) for reconnecting clients
- Client Queues: Each client gets their own queue of audio chunks
- Instant Resume: Reconnecting clients receive buffered content immediately
- Thread Safety: Process lock ensures thread-safe access to global state
- Capture: Audio saved to file while streaming (using ffmpeg
teemuxer) - Queue: Background worker picks up transcription job
- Deduplicate: Check Trilium for existing note
- Transcribe: Call OpenAI Whisper or Gemini with audio file
- Summarize: Generate AI summary of transcript
- Post: Create Trilium note with formatted content
- Cleanup: Delete temporary audio file
- Cache: Store transcript and summary for future use
- Single background worker thread processes jobs sequentially
- Main thread handles HTTP requests and streaming
- Thread-safe queue for job processing
- Jobs tracked with status: PENDING → TRANSCRIBING → SUMMARIZING → POSTING → COMPLETED
Port already in use:
# Find process using port 8000
sudo lsof -i :8000
# Kill the process
sudo kill -9 <PID>yt-dlp not found:
# Reinstall yt-dlp
sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlpffmpeg not installed:
sudo apt update
sudo apt install -y ffmpegDatabase locked:
# Stop the service
sudo systemctl stop audio-stream
# Check for locks
lsof audio_history.db
# Restart service
sudo systemctl start audio-streamTrilium connection fails:
# Test connection
uv run test_trilium.py
# Check Trilium is running
curl http://localhost:8080
# Verify ETAPI token in Trilium settingsMIT License - see LICENSE file for details.
- Built with FastAPI
- Audio processing with yt-dlp and ffmpeg
- AI powered by OpenAI and Google Gemini
- Knowledge management with Trilium Notes