@@ -28,7 +28,7 @@ A powerful FastAPI application that streams audio from YouTube videos as MP3 ove
2828
2929#### Intelligent Summarization
3030- ** Video Summaries** : AI-generated summaries of each video's content
31- - ** Multi-Provider** : OpenAI GPT or Google Gemini (Gemini recommended for cost-effectiveness )
31+ - ** Multi-Provider** : OpenAI GPT or Google Gemini (Gemini recommended for free tier )
3232- ** Knowledge Management** : Automatic posting to Trilium Notes with deduplication
3333- ** Rich Metadata** : Includes video title, channel, thumbnail, and YouTube link
3434
@@ -37,7 +37,7 @@ A powerful FastAPI application that streams audio from YouTube videos as MP3 ove
3737- ** Comprehensive Analysis** : Synthesizes all videos watched during the week
3838- ** Key Learnings** : Extracts 15 most important insights across all content
3939- ** Theme Detection** : Identifies common themes and patterns in your viewing
40- - ** Text-to-Speech** : Optional ElevenLabs TTS generation for listening to summaries
40+ - ** Text-to-Speech** : Optional TTS generation (OpenAI or ElevenLabs) for listening to summaries
4141
4242#### Smart Video Suggestions
4343- ** AI Content Discovery** : Analyzes your viewing history to suggest similar videos
@@ -125,9 +125,9 @@ TRANSCRIPTION_ENABLED=true
125125OPENAI_API_KEY=sk-... # Get from https://platform.openai.com/api-keys
126126GEMINI_API_KEY=... # Get from https://makersuite.google.com/app/apikey
127127
128- # Provider selection (recommended: Voxtral + Gemini for best cost/quality)
129- TRANSCRIPTION_PROVIDER=mistral # "openai", "mistral", or "gemini"
130- SUMMARY_PROVIDER=gemini # "gemini" (cost-effective ) or "openai"
128+ # Provider selection (recommended: Whisper + Gemini for best cost/quality)
129+ TRANSCRIPTION_PROVIDER=openai # "openai" (Whisper) or "gemini"
130+ SUMMARY_PROVIDER=gemini # "gemini" (free tier ) or "openai"
131131
132132# Trilium Notes integration (for saving summaries)
133133TRILIUM_URL=http://localhost:8080
@@ -148,13 +148,13 @@ TTS_ENABLED=false
148148 - Or use your specific local IP (e.g., ` 10.0.0.181 ` )
149149
150150- ** TRANSCRIPTION_PROVIDER** :
151- - ` openai ` = Whisper API ($0.006/min , very accurate, fast, 25MB limit)
152- - ` mistral ` = Voxtral Mini ($0.003/min, most cost-effective, 30 min limit)
153- - ` gemini ` = Gemini 2 .5 Flash (~ $0.0005-0.001/min, handles unlimited file sizes )
151+ - ` openai ` = Whisper API ($0.006/minute , very accurate, fast, 25MB limit)
152+ - ` mistral ` = Voxtral Mini ($0.003/minute, cost-effective, good quality, 15 min limit)
153+ - ` gemini ` = Gemini 1 .5 Flash (free tier available, good quality, no limits )
154154
155155- ** SUMMARY_PROVIDER** :
156- - ` gemini ` = Gemini 2.5 Flash (recommended, very cost-effective , fast)
157- - ` openai ` = GPT-4o-mini (high quality)
156+ - ` gemini ` = Gemini 2.5 Flash (recommended, free tier , fast)
157+ - ` openai ` = GPT-4o-mini (high quality, paid )
158158
159159### Step 4: Test Trilium Connection (Optional)
160160
@@ -224,24 +224,20 @@ Required for Whisper transcription or GPT summarization.
224224
225225### Google Gemini API Key
226226
227- Required for Gemini transcription or summarization. Very cost-effective pricing .
227+ Required for Gemini transcription or summarization. Has a generous free tier .
228228
2292291 . Visit https://makersuite.google.com/app/apikey
2302302 . Sign in with your Google account
2312313 . Click "Create API Key"
2322324 . Copy the key
2332335 . Add to ` .env ` file: ` GEMINI_API_KEY=... `
234234
235- ** Pricing ** :
236- - Audio transcription: ~ $0.30-0.50 per 1M input tokens + $0.40 per 1M output
237- - Text generation: $0.15 per 1M input + $0.60 per 1M output
238- - Rate limits: 15 req/min, 1,500 req/day, 1M tokens/ day
235+ ** Free Tier ** :
236+ - 15 requests per minute
237+ - 1 million tokens per day
238+ - 1,500 requests per day
239239
240- ** Benefits** :
241- - Very cost-effective for audio transcription (~ $0.0005-0.001/minute)
242- - Handles large audio files automatically (uses Files API for >20MB)
243- - No practical file size or duration limits
244- - Good for long recordings where Whisper/Voxtral hit their limits
240+ For typical use, summarization and weekly summaries are essentially free.
245241
246242### Mistral AI API Key
247243
@@ -255,7 +251,7 @@ Required for Mistral Voxtral transcription. Cost-effective option at $0.003/minu
255251
256252** Cost** : Voxtral Mini is $0.003 per minute of audio. For typical use (~ 30 hours/month), expect ~ $5-8/month (50% cheaper than Whisper).
257253
258- ** Limitation** : Maximum 15 minutes per audio file. For longer videos, use Gemini (no limit) or split the audio.
254+ ** Limitation** : Maximum 30 minutes per audio file. For longer videos, use Gemini (no limit) or split the audio.
259255
260256### Trilium ETAPI Token
261257
@@ -272,9 +268,27 @@ Required for saving transcripts and summaries to Trilium Notes.
2722682 . Right-click the note → "Copy Note ID"
2732693 . Add to ` .env ` file: ` TRILIUM_PARENT_NOTE_ID=... `
274270
275- ### ElevenLabs API Key (Optional)
271+ ### Text-to-Speech API Keys (Optional)
276272
277- Required only if you want text-to-speech for weekly summaries.
273+ Required only if you want text-to-speech for weekly summaries. Choose one provider:
274+
275+ #### OpenAI TTS (Recommended)
276+ ** Most affordable for long-form content**
277+
278+ - Pricing: $15 per 1M characters (~ $0.15 for a 10K character summary)
279+ - Quality: 6 natural voices (alloy, echo, fable, onyx, nova, shimmer)
280+ - Models: ` tts-1 ` (standard) or ` tts-1-hd ` (higher quality)
281+ - You already have the API key from transcription setup
282+
283+ Set in ` .env ` :
284+ ``` bash
285+ TTS_PROVIDER=openai
286+ OPENAI_TTS_VOICE=alloy
287+ OPENAI_TTS_MODEL=tts-1
288+ ```
289+
290+ #### ElevenLabs (Alternative)
291+ ** Higher quality voices, more expensive**
278292
2792931 . Visit https://elevenlabs.io/
2802942 . Sign up or sign in
@@ -284,6 +298,12 @@ Required only if you want text-to-speech for weekly summaries.
284298
285299** Free Tier** : 10,000 characters per month (~ 7-10 summaries)
286300
301+ Set in ` .env ` :
302+ ``` bash
303+ TTS_PROVIDER=elevenlabs
304+ ELEVENLABS_VOICE_ID=pNInz6obpgDQGcFmaJgB
305+ ```
306+
287307## Configuration Reference
288308
289309### Environment Variables
@@ -347,9 +367,13 @@ All configuration is done via the `.env` file. See `.env.example` for a complete
347367
348368| Variable | Default | Description |
349369| ----------| ---------| -------------|
350- | ` TTS_ENABLED ` | ` false ` | Enable ElevenLabs TTS for summaries |
351- | ` ELEVENLABS_API_KEY ` | - | ElevenLabs API key |
352- | ` ELEVENLABS_VOICE_ID ` | ` pNInz6obpgDQGcFmaJgB ` | Voice ID (Adam by default) |
370+ | ` TTS_ENABLED ` | ` false ` | Enable TTS for summaries |
371+ | ` TTS_PROVIDER ` | ` openai ` | Provider: ` openai ` or ` elevenlabs ` |
372+ | ` OPENAI_TTS_VOICE ` | ` alloy ` | OpenAI voice (alloy, echo, fable, onyx, nova, shimmer) |
373+ | ` OPENAI_TTS_MODEL ` | ` tts-1 ` | OpenAI model (` tts-1 ` or ` tts-1-hd ` ) |
374+ | ` ELEVENLABS_API_KEY ` | - | ElevenLabs API key (if using ElevenLabs) |
375+ | ` ELEVENLABS_VOICE_ID ` | ` pNInz6obpgDQGcFmaJgB ` | ElevenLabs voice ID (Adam by default) |
376+ | ` ELEVENLABS_MODEL_ID ` | ` eleven_flash_v2_5 ` | ElevenLabs model |
353377| ` WEEKLY_SUMMARY_AUDIO_DIR ` | ` /var/audio-summaries ` | Where to store TTS audio files |
354378
355379## API Endpoints
@@ -530,36 +554,29 @@ curl "http://localhost:8000/admin/weekly-summary/next-run"
530554| gpt-4o | $2.50 | $10.00 | Higher quality |
531555| whisper-1 | - | - | $0.006 per minute, 25MB limit |
532556| ** Mistral AI** ||||
533- | voxtral-mini-latest | - | - | $0.003 per minute, 30 min limit |
557+ | voxtral-mini-latest | - | - | $0.003 per minute, 15 min limit |
534558| ** Google Gemini** ||||
535- | gemini-2.5-flash | $0.15 | $0.60 | Text: Fast, comparable to gpt-4o-mini |
536- | gemini-2.5-flash (audio) | $0.30-0.50 | $0.40 | Audio transcription (token-based) |
559+ | gemini-2.5-flash | $0.15 | $0.60 | Fast, comparable to gpt-4o-mini (recommended) |
537560| gemini-1.5-flash | $0.10 | $0.40 | Slightly older, still excellent |
538561| gemini-1.5-pro | $1.25 | $5.00 | Higher quality |
539562
540- ** Note** : Gemini audio pricing is per 1M tokens. Audio duration to token conversion varies, but typically ~ 1 minute ≈ 1,000-1,500 tokens.
541-
542563### Estimated Costs Per Operation
543564
544- ** Transcription Options :**
565+ ** Using recommended configuration (Whisper + Gemini 2.5 Flash) :**
545566
546- 1 . ** Whisper (OpenAI) ** - Most accurate
547- - $0.006 per minute
548- - 10 min = $0.06 | 1 hour = $0.36
567+ - ** Video transcription ** (Whisper): $0.006 per minute of audio
568+ - 10 min video = $0.06
569+ - 1 hour video = $0.36
549570
550- 2 . ** Voxtral (Mistral)** - Cost-effective (50% cheaper)
551- - $0.003 per minute
552- - 10 min = $0.03 | 1 hour = $0.18
571+ ** Alternative: Cost-optimized (Voxtral + Gemini 2.5 Flash):**
553572
554- 3 . ** Gemini 2.5 Flash** - Token-based, good for long files
555- - ~ $0.30-0.50 per 1M input tokens + $0.40 per 1M output
556- - Estimate: ~ $0.0005-0.001 per minute (varies by audio complexity)
557- - 10 min ≈ $0.005-0.01 | 1 hour ≈ $0.03-0.06
558- - Best for: Very long recordings, handles unlimited file sizes
573+ - ** Video transcription** (Voxtral Mini): $0.003 per minute of audio (50% cheaper)
574+ - 10 min video = $0.03
575+ - 1 hour video = $0.18
559576
560- ** Summarization ** (Gemini 2.5 Flash text):
561- - Typical: 2,000 input + 500 output tokens
562- - Cost: (2,000 × $0.15 + 500 × $0.60) / 1,000,000 = ** $0.0006**
577+ - ** Video summarization ** (Gemini 2.5 Flash): ~ $0.0003-0.001 per summary
578+ - Typical: 2,000 input tokens + 500 output tokens
579+ - Cost: (2,000 × $0.15 + 500 × $0.60) / 1,000,000 = ** $0.0006**
563580
564581- ** Weekly summary** (Gemini 2.5 Flash): ~ $0.003-0.01 per summary
565582 - Typical: 10,000 input tokens + 2,000 output tokens
@@ -589,21 +606,20 @@ curl "http://localhost:8000/admin/weekly-summary/next-run"
589606- Weekly summaries: 4 weeks × $0.0027 = ** $0.01**
590607- ** Total: ~ $36.10/month**
591608
592- ### Gemini Pricing Advantages
609+ ### Gemini Free Tier
593610
594- Gemini offers very competitive pricing, especially for text generation :
595- - Text: $0. 15 input + $0.60 output per 1M tokens
596- - Audio: $0.30-0.50 input + $0.40 output per 1M tokens
597- - Rate limits: 15 req/min, 1M tokens/day, 1,500 req/ day
611+ Gemini has a generous free tier that covers most summarization needs :
612+ - 15 requests per minute
613+ - 1 million tokens per day
614+ - 1,500 requests per day
598615
599- ** Cost-effective for:**
600- - Video summarization (~ $0.0006 per summary - nearly negligible)
601- - Weekly summaries (~ $0.003 per summary)
602- - Smart suggestions (~ $0.002 per request)
603- - Long audio transcriptions (~ $0.0005-0.001 per minute, cheaper than Whisper for >6 min files)
616+ ** What's free:**
617+ - Video summarization (essentially unlimited for personal use)
618+ - Weekly summaries (4 per month)
619+ - Smart suggestions (as much as you need)
604620
605- ** Costs more than alternatives :**
606- - Short audio transcription: Voxtral is more cost-effective ($0.003/min fixed )
621+ ** What costs money :**
622+ - Transcription with Whisper (no free option for high quality )
607623
608624### Cost Tracking
609625
@@ -671,7 +687,7 @@ sudo systemctl restart audio-stream
671687sudo systemctl stop audio-stream
672688
673689# View logs
674- journalctl -u audio-stream -n 100 -f
690+ journalctl -u audio-stream -n 1000 -f
675691```
676692
677693** Note:** The service automatically loads your ` .env ` file from the WorkingDirectory.
0 commit comments