Audio Stream Server

A powerful FastAPI application that streams audio from YouTube videos as MP3 over HTTP, with automatic transcription, AI-powered summaries, and intelligent knowledge management.

Features

Core Streaming

YouTube Audio Streaming: Stream any YouTube video as MP3 audio; supports full YouTube URLs, short youtu.be links, and raw video IDs
Smart Queue System: Build playlists with a persistent queue (survives page refreshes) that supports both YouTube videos and weekly summary audio as mixed items
Auto-Play & Prefetching: Automatically plays the next track when the current one finishes; pre-downloads the upcoming track in the background for seamless transitions
Play History: Tracks all played videos with play counts, relative timestamps, and rich metadata (title, channel, thumbnail); click any entry to replay
Stream Resilience: Automatic stalling detection with up to 50 reconnection retries and adaptive back-off delay

Playback Controls

Audio Player: HTML5 player with Play/Pause, Rewind (−15 s), Fast-Forward (+15 s), and Stop
Speed Controls: Switch between 1×, 1.15×, 1.3×, and 1.5× playback speed
Keyboard Shortcuts: Space to play/pause; ↑/↓ to skip forward/back 15 s
MediaSession API: Rich media controls in car entertainment systems (Tesla tested), lock screens, and notification panels — displays title, channel, and album art

Web Interface

Mobile-First Design: Responsive layout optimised for phones, tablets, and desktops
Dark / Light Theme: Automatic system-preference detection with manual toggle; preference saved in local storage
Drag-and-Drop Queue Reordering: Reorder queue items by dragging (mouse and touch); order saved instantly to the database

AI-Powered Features

Automatic Transcription (`TRANSCRIPTION_ENABLED=true`)

Multi-Provider: OpenAI Whisper, Mistral Voxtral, or Google Gemini — switch via config
Audio Optimisation: Automatic compression (1.5× speed, mono, 32 kbps, 16 kHz) reduces Whisper API costs ~33 %
Background Processing: Runs asynchronously; transcription status visible in UI with polling every 5 s
Smart Caching: Transcripts cached locally; Trilium deduplication prevents re-processing the same video

Intelligent Summarisation

Video Summaries: AI-generated summaries posted to Trilium Notes with video title, channel, thumbnail, and YouTube link
Multi-Provider: OpenAI GPT or Google Gemini (Gemini recommended for free tier)

Weekly Summaries (`WEEKLY_SUMMARY_ENABLED=true`)

Automated Scheduling: Configurable schedule (default: Fridays at 11 PM) to summarise the week's viewing
Comprehensive Analysis: Synthesises all videos watched during the week; extracts key learnings and common themes
Text-to-Speech: Optional TTS (OpenAI or ElevenLabs) converts the written summary to audio you can queue and play

Smart Video Suggestions (`BOOK_SUGGESTIONS_ENABLED=true`)

AI Content Discovery: Analyses viewing history and Trilium summaries to find thematically related YouTube videos
Configurable: Control how many recent videos to analyse and how many suggestions to generate

Data & Analytics

LLM Usage Dashboard: HTML dashboard at /admin/stats plus JSON API with filtering by date range, provider, model, and feature
Cost Monitoring: Token counts and audio-duration tracking for accurate per-minute pricing (Whisper, Voxtral)
Client-Side Logging: Browser logs batched and forwarded to the server — essential for debugging on car displays and mobile devices without a console

Developer Experience

Modern Stack: FastAPI, Python 3.12, SQLite with type-safe models and dataclasses
Comprehensive Testing: 76 %+ test coverage with pytest and pre-commit hooks
CI/CD: Automated testing with GitHub Actions on every push and pull request
Type Safety: Full return-type annotations; mypy-compatible throughout

Quick Start

Prerequisites

Operating System: Linux (Ubuntu/Debian recommended)
Python: 3.12 or higher
uv: Fast Python package manager (installed automatically by setup script)

Step 1: Clone the Repository

git clone https://github.com/MrDesjardins/audio-stream-server.git
cd audio-stream-server

Step 2: Run the Setup Script

The automated setup script installs all dependencies and initializes the database:

chmod +x setup.sh
./setup.sh

This script will:

Install system dependencies (yt-dlp, ffmpeg)
Install uv package manager if not present
Install Python dependencies
Initialize the SQLite database
Create a .env file from .env.example

Step 3: Configure Your Environment

Edit the .env file to customize your settings:

nano .env

Minimum configuration (no AI features):

# Server configuration
ENV=production
FASTAPI_HOST=0.0.0.0  # Listen on all interfaces (accessible from network)
FASTAPI_API_PORT=8000

# Disable AI features for basic streaming
TRANSCRIPTION_ENABLED=false

Full configuration (with AI features):

# Server configuration
ENV=production
FASTAPI_HOST=0.0.0.0
FASTAPI_API_PORT=8000

# Enable AI features
TRANSCRIPTION_ENABLED=true

# AI API Keys
OPENAI_API_KEY=sk-...  # Get from https://platform.openai.com/api-keys
GEMINI_API_KEY=...     # Get from https://makersuite.google.com/app/apikey

# Provider selection (recommended: Whisper + Gemini for best cost/quality)
TRANSCRIPTION_PROVIDER=openai  # "openai" (Whisper), "mistral" (Voxtral), or "gemini"
SUMMARY_PROVIDER=gemini        # "gemini" (free tier) or "openai"

# Trilium Notes integration (for saving summaries)
TRILIUM_URL=http://localhost:8080
TRILIUM_ETAPI_TOKEN=...        # From Trilium: Options → ETAPI
TRILIUM_PARENT_NOTE_ID=...     # Right-click note → "Copy Note ID"

# Optional features
WEEKLY_SUMMARY_ENABLED=false
BOOK_SUGGESTIONS_ENABLED=false
TTS_ENABLED=false

Important settings explained:

FASTAPI_HOST:
- 0.0.0.0 = Accessible from all network devices (recommended)
- 127.0.0.1 = Localhost only (not accessible from other devices)
- Or use your specific local IP (e.g., 10.0.0.181)
TRANSCRIPTION_PROVIDER:
- openai = Whisper API ($0.006/minute, very accurate, fast, 25MB limit)
- mistral = Voxtral Mini ($0.003/minute, cost-effective, good quality, 15 min limit)
- gemini = Gemini 1.5 Flash (free tier available, good quality, no limits)
SUMMARY_PROVIDER:
- gemini = Gemini 2.5 Flash (recommended, free tier, fast)
- openai = GPT-4o-mini (high quality, paid)

Step 4: Test Trilium Connection (Optional)

If you enabled transcription with Trilium integration:

uv run test_trilium.py

This verifies:

Trilium is reachable at your configured URL
ETAPI token is valid
Parent note ID exists and is accessible

Step 5: Configure Firewall

If running on a server, allow access to port 8000:

sudo ufw allow 8000/tcp
sudo ufw reload
sudo ufw status

Step 6: Run the Application

Development mode (with auto-reload):

uv run main.py

Production mode (as systemd service):

See Running as a Service section below.

Step 7: Access the Web Interface

Open your browser and navigate to:

http://localhost:8000          # If running locally
http://YOUR_SERVER_IP:8000     # If running on a server

You should see the web interface with:

Search bar to enter YouTube URLs or video IDs
Play history
Queue management
Transcription status (if enabled)

Getting API Keys

OpenAI API Key

Required for Whisper transcription or GPT summarization.

Visit https://platform.openai.com/api-keys
Sign in or create an account
Click "Create new secret key"
Copy the key (starts with sk-)
Add to .env file: OPENAI_API_KEY=sk-...

Cost: Whisper is $0.006 per minute of audio. For typical use (~30 hours/month), expect ~$10-15/month.

Limitation: Maximum 25MB file size. Audio is automatically compressed (1.5x speed, mono, 32kbps) to save costs and meet this limit. For very long videos (>2 hours), use Gemini instead.

Google Gemini API Key

Required for Gemini transcription or summarization. Has a generous free tier.

Visit https://makersuite.google.com/app/apikey
Sign in with your Google account
Click "Create API Key"
Copy the key
Add to .env file: GEMINI_API_KEY=...

Free Tier:

15 requests per minute
1 million tokens per day
1,500 requests per day

For typical use, summarization and weekly summaries are essentially free.

Mistral AI API Key

Required for Mistral Voxtral transcription. Cost-effective option at $0.003/minute.

Visit https://console.mistral.ai/api-keys
Sign in or create an account
Click "Create new key"
Copy the key
Add to .env file: MISTRAL_API_KEY=...

Cost: Voxtral Mini is $0.003 per minute of audio. For typical use (~30 hours/month), expect ~$5-8/month (50% cheaper than Whisper).

Limitation: Maximum 30 minutes per audio file. For longer videos, use Gemini (no limit) or split the audio.

Trilium ETAPI Token

Required for saving transcripts and summaries to Trilium Notes.

Open Trilium Notes in your browser
Go to Options → ETAPI
Click "Create new token" or copy an existing one
Copy the token
Add to .env file: TRILIUM_ETAPI_TOKEN=...

Get Parent Note ID:

In Trilium, navigate to or create a note where you want summaries stored
Right-click the note → "Copy Note ID"
Add to .env file: TRILIUM_PARENT_NOTE_ID=...

Text-to-Speech API Keys (Optional)

Required only if you want text-to-speech for weekly summaries. Choose one provider:

OpenAI TTS (Recommended)

Most affordable for long-form content

Pricing: $15 per 1M characters (~$0.15 for a 10K character summary)
Quality: 6 natural voices (alloy, echo, fable, onyx, nova, shimmer)
Models: tts-1 (standard) or tts-1-hd (higher quality)
You already have the API key from transcription setup

Set in .env:

TTS_PROVIDER=openai
OPENAI_TTS_VOICE=alloy
OPENAI_TTS_MODEL=tts-1

ElevenLabs (Alternative)

Higher quality voices, more expensive

Visit https://elevenlabs.io/
Sign up or sign in
Go to your profile → API Keys
Copy your API key
Add to .env file: ELEVENLABS_API_KEY=...

Free Tier: 10,000 characters per month (~7-10 summaries)

Set in .env:

TTS_PROVIDER=elevenlabs
ELEVENLABS_VOICE_ID=pNInz6obpgDQGcFmaJgB

Configuration Reference

Environment Variables

All configuration is done via the .env file. See .env.example for a complete reference with descriptions.

Core Settings:

Variable	Default	Description
`ENV`	`production`	Environment mode (`development` or `production`)
`FASTAPI_HOST`	`0.0.0.0`	IP address to bind to
`FASTAPI_API_PORT`	`8000`	Port to listen on
`DATABASE_PATH`	`./audio_history.db`	SQLite database location

Audio Settings:

Variable	Default	Description
`AUDIO_QUALITY`	`4`	MP3 quality (0=best, 9=smallest, 4=~128kbps)
`AUDIO_CACHE_MAX_FILES`	`10`	Number of audio files to keep cached
`PREFETCH_THRESHOLD_SECONDS`	`30`	When to start downloading next track
`TEMP_AUDIO_DIR`	`/tmp/audio-transcriptions`	Where to store audio files

Transcription Settings:

Variable	Default	Description
`TRANSCRIPTION_ENABLED`	`false`	Enable AI transcription features
`TRANSCRIPTION_PROVIDER`	`openai`	Provider: `openai` (Whisper), `mistral` (Voxtral), or `gemini`
`TRANSCRIPTION_MODEL`	(auto)	Model override (optional)
`SUMMARY_PROVIDER`	`gemini`	Provider for video summaries
`SUMMARY_MODEL`	(auto)	Model override (optional)

Trilium Integration:

Variable	Required When	Description
`TRILIUM_URL`	Transcription enabled	Trilium instance URL
`TRILIUM_ETAPI_TOKEN`	Transcription enabled	ETAPI authentication token
`TRILIUM_PARENT_NOTE_ID`	Transcription enabled	Note ID where summaries are stored

Weekly Summaries:

Variable	Default	Description
`WEEKLY_SUMMARY_ENABLED`	`false`	Enable automated weekly summaries
`WEEKLY_SUMMARY_PROVIDER`	`gemini`	AI provider for weekly summaries
`WEEKLY_SUMMARY_MODEL`	(auto)	Model override (optional)

Smart Suggestions:

Variable	Default	Description
`BOOK_SUGGESTIONS_ENABLED`	`false`	Enable AI video suggestions
`BOOKS_TO_ANALYZE`	`10`	How many recent videos to analyze
`SUGGESTIONS_COUNT`	`4`	Number of suggestions to generate
`SUGGESTIONS_AI_PROVIDER`	`gemini`	AI provider for suggestions

Text-to-Speech:

Variable	Default	Description
`TTS_ENABLED`	`false`	Enable TTS for summaries
`TTS_PROVIDER`	`openai`	Provider: `openai` or `elevenlabs`
`OPENAI_TTS_VOICE`	`alloy`	OpenAI voice (alloy, echo, fable, onyx, nova, shimmer)
`OPENAI_TTS_MODEL`	`tts-1`	OpenAI model (`tts-1` or `tts-1-hd`)
`ELEVENLABS_API_KEY`	-	ElevenLabs API key (if using ElevenLabs)
`ELEVENLABS_VOICE_ID`	`pNInz6obpgDQGcFmaJgB`	ElevenLabs voice ID (Adam by default)
`ELEVENLABS_MODEL_ID`	`eleven_flash_v2_5`	ElevenLabs model
`WEEKLY_SUMMARY_AUDIO_DIR`	`/var/audio-summaries`	Where to store TTS audio files

API Endpoints

Streaming Endpoints

Start streaming a YouTube video:

curl -X POST http://localhost:8000/stream \
  -H "Content-Type: application/json" \
  -d '{"youtube_video_id": "dQw4w9WgXcQ"}'

# Or with a full YouTube URL:
curl -X POST http://localhost:8000/stream \
  -H "Content-Type: application/json" \
  -d '{"youtube_video_id": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'

# Skip transcription for this video:
curl -X POST http://localhost:8000/stream \
  -H "Content-Type: application/json" \
  -d '{"youtube_video_id": "dQw4w9WgXcQ", "skip_transcription": true}'

Get the audio stream:

# Open in browser or media player:
http://localhost:8000/mystream

Check stream status:

curl http://localhost:8000/status

Stop current stream:

curl -X POST http://localhost:8000/stop

Queue Management

Add video to queue:

curl -X POST http://localhost:8000/queue/add \
  -H "Content-Type: application/json" \
  -d '{"youtube_video_id": "dQw4w9WgXcQ"}'

Get current queue:

curl http://localhost:8000/queue

Skip to next track:

curl -X POST http://localhost:8000/queue/next

Remove specific item from queue:

curl -X DELETE http://localhost:8000/queue/123

Clear entire queue:

curl -X POST http://localhost:8000/queue/clear

Play History

Get play history:

curl http://localhost:8000/history

Clear play history:

curl -X POST http://localhost:8000/history/clear

Transcription & Summaries

Get transcription status:

curl http://localhost:8000/transcription/status/dQw4w9WgXcQ

Manually trigger transcription:

curl -X POST http://localhost:8000/transcription/start/dQw4w9WgXcQ

Get summary and Trilium link:

curl http://localhost:8000/transcription/summary/dQw4w9WgXcQ

LLM Usage Analytics

Get detailed usage statistics:

# Recent usage (last 100 records)
curl "http://localhost:8000/admin/llm-usage/stats?limit=100"

# Filter by provider
curl "http://localhost:8000/admin/llm-usage/stats?provider=openai"
curl "http://localhost:8000/admin/llm-usage/stats?provider=gemini"

# Filter by model
curl "http://localhost:8000/admin/llm-usage/stats?model=whisper-1"
curl "http://localhost:8000/admin/llm-usage/stats?model=gpt-4o-mini"

# Filter by feature
curl "http://localhost:8000/admin/llm-usage/stats?feature=transcription"
curl "http://localhost:8000/admin/llm-usage/stats?feature=summarization"
curl "http://localhost:8000/admin/llm-usage/stats?feature=weekly_summary"

# Date range filter (ISO 8601 format)
curl "http://localhost:8000/admin/llm-usage/stats?start_date=2026-02-01T00:00:00&end_date=2026-02-03T23:59:59"

# Combine filters
curl "http://localhost:8000/admin/llm-usage/stats?provider=openai&feature=transcription&limit=50"

Get aggregated summary:

# Overall summary (all time)
curl "http://localhost:8000/admin/llm-usage/summary"

# Summary for specific date range
curl "http://localhost:8000/admin/llm-usage/summary?start_date=2026-02-01T00:00:00&end_date=2026-02-28T23:59:59"

Response format:

{
  "status": "success",
  "summary": {
    "totals": {
      "call_count": 150,
      "total_prompt_tokens": 125000,
      "total_response_tokens": 45000,
      "total_tokens": 170000
    },
    "by_provider_model_feature": [
      {
        "provider": "openai",
        "model": "whisper-1",
        "feature": "transcription",
        "call_count": 50,
        "total_tokens": 50000
      }
    ]
  }
}

Weekly Summary Management

Trigger weekly summary manually:

curl -X POST "http://localhost:8000/admin/weekly-summary/trigger"

or for a specific week:

curl -X POST "http://localhost:8000/admin/weekly-summary/trigger" \
  -H "Content-Type: application/json" \
  -d '{"date": "2026-02-06"}'

Get next scheduled run time:

curl "http://localhost:8000/admin/weekly-summary/next-run"

Cost Estimates

Current Model Pricing (Per 1M Tokens)

Model	Input	Output	Notes
OpenAI
gpt-4o-mini	$0.15	$0.60	Reliable workhorse (recommended)
gpt-4o	$2.50	$10.00	Higher quality
whisper-1	$0.006/min	-	Audio transcription, 25MB limit
tts-1	$15.00	-	Text-to-speech (per 1M chars)
tts-1-hd	$30.00	-	TTS HD quality (per 1M chars)
Mistral AI
voxtral-mini-latest	$0.003/min	-	Audio transcription, 15 min limit
Google Gemini
gemini-2.5-flash	$0.15	$0.60	Fast, comparable to gpt-4o-mini (recommended)
gemini-2.5-flash-preview-tts	$0.40	$0.40	Audio transcription (per 1M tokens)
gemini-1.5-flash	$0.10	$0.40	Slightly older, still excellent
gemini-1.5-pro	$1.25	$5.00	Higher quality
ElevenLabs
eleven_flash_v2_5	$100.00	-	TTS (per 1M chars), ~$0.10 per 1K
eleven_turbo_v2_5	$300.00	-	TTS higher quality (per 1M chars)

Estimated Costs Per Operation

Using recommended configuration (Whisper + Gemini 2.5 Flash):

Video transcription (Whisper): $0.006 per minute of audio
- 10 min video = $0.06
- 1 hour video = $0.36

Alternative: Cost-optimized (Voxtral + Gemini 2.5 Flash):

Video transcription (Voxtral Mini): $0.003 per minute of audio (50% cheaper)
- 10 min video = $0.03
- 1 hour video = $0.18
Video summarization (Gemini 2.5 Flash): ~$0.0003-0.001 per summary
- Typical: 2,000 input tokens + 500 output tokens
- Cost: (2,000 × $0.15 + 500 × $0.60) / 1,000,000 = $0.0006
Weekly summary (Gemini 2.5 Flash): ~$0.003-0.01 per summary
- Typical: 10,000 input tokens + 2,000 output tokens
- Cost: (10,000 × $0.15 + 2,000 × $0.60) / 1,000,000 = $0.0027
Book suggestions (Gemini 2.5 Flash): ~$0.0002-0.0005 per request
- Typical: 1,000 input tokens + 100 output tokens
- Cost: (1,000 × $0.15 + 100 × $0.60) / 1,000,000 = $0.0002

Example Monthly Costs

Light usage (10 hours/month, 15 videos):

Transcription: 10 hours × 60 min × $0.006 = $3.60
Summarization: 15 videos × $0.0006 = $0.01
Weekly summaries: 4 weeks × $0.0027 = $0.01
Total: ~$3.62/month

Moderate usage (30 hours/month, 45 videos):

Transcription: 30 hours × 60 min × $0.006 = $10.80
Summarization: 45 videos × $0.0006 = $0.03
Weekly summaries: 4 weeks × $0.0027 = $0.01
Total: ~$10.84/month

Heavy usage (100 hours/month, 150 videos):

Transcription: 100 hours × 60 min × $0.006 = $36.00
Summarization: 150 videos × $0.0006 = $0.09
Weekly summaries: 4 weeks × $0.0027 = $0.01
Total: ~$36.10/month

Gemini Free Tier

Gemini has a generous free tier that covers most summarization needs:

15 requests per minute
1 million tokens per day
1,500 requests per day

What's free:

Video summarization (essentially unlimited for personal use)
Weekly summaries (4 per month)
Smart suggestions (as much as you need)

What costs money:

Transcription with Whisper (no free option for high quality)

Cost Tracking

Use the admin endpoints to monitor your actual usage:

# Get total tokens used this month
curl "http://localhost:8000/admin/llm-usage/summary?start_date=2026-02-01T00:00:00"

# Check Whisper usage
curl "http://localhost:8000/admin/llm-usage/stats?model=whisper-1&limit=1000"

Calculate costs based on current provider pricing:

OpenAI Whisper: audio duration minutes × $0.006
Mistral Voxtral: audio duration minutes × $0.003
GPT-4o-mini: (prompt_tokens × $0.15 + response_tokens × $0.60) / 1,000,000
Gemini: Usually free up to limits

Running as a Service

Systemd Service Setup

The application can run as a systemd service for automatic startup and management.

1. Edit the service file:

nano audio-stream.service

Update these lines with your actual username and paths:

User=YOUR_USERNAME
WorkingDirectory=/home/YOUR_USERNAME/audio-stream-server

2. Install the service:

sudo cp audio-stream.service /etc/systemd/system/audio-stream.service
sudo systemctl daemon-reload

3. Enable and start:

# Enable (start on boot)
sudo systemctl enable audio-stream

# Start now
sudo systemctl start audio-stream

# Check status
sudo systemctl status audio-stream

4. Manage the service:

# Restart
sudo systemctl restart audio-stream

# Stop
sudo systemctl stop audio-stream

# View logs
journalctl -u audio-stream -n 1000 -f

Note: The service automatically loads your .env file from the WorkingDirectory.

Updating the Application

Use the provided update script to safely update:

./update.sh

This script:

Checks if service is running
Pulls latest changes from git
Updates Python dependencies
Checks and installs missing system dependencies
Runs database migrations
Restarts the service if it was running

Development

Setting Up Development Environment

# Clone repository
git clone https://github.com/MrDesjardins/audio-stream-server.git
cd audio-stream-server

# Run setup
./setup.sh

# Install development dependencies
uv sync --extra dev --extra test

# Install pre-commit hooks
uv run pre-commit install

Running Tests

# Run all tests with coverage
uv run pytest

# Run without coverage (faster)
uv run pytest --no-cov

# Run specific test file
uv run pytest tests/services/test_database.py

# Run specific test
uv run pytest tests/services/test_database.py::TestDatabase::test_add_history

# Use the test runner script
./run_tests.sh all        # All tests with coverage
./run_tests.sh fast       # Fast mode (no coverage)
./run_tests.sh services   # Only service tests
./run_tests.sh routes     # Only route tests

See TESTING.md for comprehensive testing documentation.

Code Quality

Pre-commit hooks (automatic on commit):

# One-time setup
uv run pre-commit install

# Hooks run automatically on git commit
# They auto-fix issues and add fixes to your commit

Manual linting:

# Lint and auto-fix with Ruff
uv run ruff check --fix

# Format code
uv run ruff format .

# Type checking
uv run mypy .

# Run all pre-commit hooks manually
uv run pre-commit run --all-files

See LINTING.md for detailed linting documentation.

Project Structure

audio-stream-server/
├── main.py                     # FastAPI app initialization
├── config.py                   # Configuration management
├── routes/                     # API route handlers
│   ├── stream.py               # Streaming and playback
│   ├── queue.py                # Queue management
│   ├── transcription.py        # Transcription endpoints
│   └── admin.py                # Admin endpoints
├── services/                   # Core business logic
│   ├── streaming.py            # yt-dlp and ffmpeg pipeline
│   ├── broadcast.py            # Multi-client streaming
│   ├── database.py             # SQLite operations
│   ├── youtube.py              # YouTube metadata
│   ├── transcription.py        # OpenAI/Gemini transcription
│   ├── summarization.py        # AI summarization
│   ├── trilium.py              # Trilium Notes integration
│   ├── background_tasks.py     # Async processing
│   ├── llm_clients.py          # AI client wrappers
│   └── cache.py                # Audio and transcript caching
├── templates/
│   └── index.html              # Jinja2 web interface
├── static/
│   ├── style.css               # Responsive dark theme
│   └── fonts/                  # Self-hosted fonts
└── tests/                      # Comprehensive test suite
    ├── services/
    └── routes/

Database Migrations

Database schema updates are handled by migration scripts:

# Run all migrations (done automatically by update.sh)
uv run python migrate_database.py              # Base schema
uv run python migrate_add_metadata.py          # Channel and thumbnail fields
uv run python migrate_add_queue_columns.py     # Queue type and week_year
uv run python migrate_add_llm_stats.py         # LLM usage tracking table
uv run python migrate_add_audio_duration.py    # Audio duration for cost tracking
uv run python migrate_add_weekly_summary.py    # Weekly summaries table

Each migration:

Creates a backup before making changes
Is idempotent (safe to run multiple times)
Preserves all existing data
Runs automatically during ./update.sh

Contributing

See CONTRIBUTING.md for development guidelines, including:

Code style requirements
Type annotation standards
Path handling with expand_path()
Testing requirements (76% minimum coverage)
Pre-commit hook usage

CI/CD

GitHub Actions automatically:

Runs tests on every push and PR
Generates coverage reports
Runs code quality checks (Ruff, mypy)
Posts coverage to PR comments

See CI_SETUP.md for CI configuration details.

Architecture

Streaming Pipeline

Input: Client sends YouTube video ID via /stream endpoint
Extract: yt-dlp extracts best audio from YouTube → stdout
Convert: ffmpeg converts audio to MP3 → stdout
Broadcast: StreamBroadcaster reads from ffmpeg and broadcasts to all connected clients
Multi-client: Multiple clients can stream simultaneously via /mystream endpoint
Playback: HTML5 audio player consumes the stream

Multi-Client Streaming

StreamBroadcaster: Manages concurrent client connections with replay buffers
Replay Buffer: Last 100 chunks (~800KB) for reconnecting clients
Client Queues: Each client gets their own queue of audio chunks
Instant Resume: Reconnecting clients receive buffered content immediately
Thread Safety: Process lock ensures thread-safe access to global state

Transcription Pipeline

Capture: Audio saved to file while streaming (using ffmpeg tee muxer)
Queue: Background worker picks up transcription job
Deduplicate: Check Trilium for existing note
Transcribe: Call OpenAI Whisper or Gemini with audio file
Summarize: Generate AI summary of transcript
Post: Create Trilium note with formatted content
Cleanup: Delete temporary audio file
Cache: Store transcript and summary for future use

Background Processing

Single background worker thread processes jobs sequentially
Main thread handles HTTP requests and streaming
Thread-safe queue for job processing
Jobs tracked with status: PENDING → TRANSCRIBING → SUMMARIZING → POSTING → COMPLETED

Troubleshooting

Common Issues

Port already in use:

# Find process using port 8000
sudo lsof -i :8000

# Kill the process
sudo kill -9 <PID>

yt-dlp not found:

# Reinstall yt-dlp
sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp

ffmpeg not installed:

sudo apt update
sudo apt install -y ffmpeg

Database locked:

# Stop the service
sudo systemctl stop audio-stream

# Check for locks
lsof audio_history.db

# Restart service
sudo systemctl start audio-stream

Trilium connection fails:

# Test connection
uv run test_trilium.py

# Check Trilium is running
curl http://localhost:8080

# Verify ETAPI token in Trilium settings

License

MIT License - see LICENSE file for details.

Acknowledgments

Built with FastAPI
Audio processing with yt-dlp and ffmpeg
AI powered by OpenAI and Google Gemini
Knowledge management with Trilium Notes

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.claude/rules		.claude/rules
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
routes		routes
services		services
static		static
templates		templates
tests		tests
.cursorrules		.cursorrules
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CI_SETUP.md		CI_SETUP.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
IMPROVEMENTS_COMPLETED.md		IMPROVEMENTS_COMPLETED.md
LINTING.md		LINTING.md
README.md		README.md
TESTING.md		TESTING.md
audio-stream.service		audio-stream.service
config.py		config.py
deploy.sh		deploy.sh
generate_version.py		generate_version.py
lint.sh		lint.sh
list_elevenlabs_voices.py		list_elevenlabs_voices.py
main.py		main.py
migrate_add_audio_duration.py		migrate_add_audio_duration.py
migrate_add_llm_stats.py		migrate_add_llm_stats.py
migrate_add_metadata.py		migrate_add_metadata.py
migrate_add_queue_columns.py		migrate_add_queue_columns.py
migrate_add_weekly_summary.py		migrate_add_weekly_summary.py
migrate_database.py		migrate_database.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
run_tests.sh		run_tests.sh
setup.sh		setup.sh
test-ci-locally.sh		test-ci-locally.sh
test_trilium.py		test_trilium.py
update.sh		update.sh
uv.lock		uv.lock

MrDesjardins/audio-stream-server

Folders and files

Latest commit

History

Repository files navigation

Audio Stream Server

Features

Core Streaming

Playback Controls

Web Interface

AI-Powered Features

Automatic Transcription (TRANSCRIPTION_ENABLED=true)

Intelligent Summarisation

Weekly Summaries (WEEKLY_SUMMARY_ENABLED=true)

Smart Video Suggestions (BOOK_SUGGESTIONS_ENABLED=true)

Data & Analytics

Developer Experience

Quick Start

Prerequisites

Step 1: Clone the Repository

Step 2: Run the Setup Script

Step 3: Configure Your Environment

Step 4: Test Trilium Connection (Optional)

Step 5: Configure Firewall

Step 6: Run the Application

Step 7: Access the Web Interface

Getting API Keys

OpenAI API Key

Google Gemini API Key

Mistral AI API Key

Trilium ETAPI Token

Text-to-Speech API Keys (Optional)

OpenAI TTS (Recommended)

ElevenLabs (Alternative)

Configuration Reference

Environment Variables

API Endpoints

Streaming Endpoints

Queue Management

Play History

Transcription & Summaries

LLM Usage Analytics

Weekly Summary Management

Cost Estimates

Current Model Pricing (Per 1M Tokens)

Estimated Costs Per Operation

Example Monthly Costs

Gemini Free Tier

Cost Tracking

Running as a Service

Systemd Service Setup

Updating the Application

Development

Setting Up Development Environment

Running Tests

Code Quality

Project Structure

Database Migrations

Contributing

CI/CD

Architecture

Streaming Pipeline

Multi-Client Streaming

Transcription Pipeline

Background Processing

Troubleshooting

Common Issues

License

Acknowledgments

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Automatic Transcription (`TRANSCRIPTION_ENABLED=true`)

Weekly Summaries (`WEEKLY_SUMMARY_ENABLED=true`)

Smart Video Suggestions (`BOOK_SUGGESTIONS_ENABLED=true`)

Packages