AI Subtitles

Local-first AI video transcription with speaker diarization, semantic search, and RAG-powered chat.

Features

Local AI Transcription - Faster Whisper runs on your machine, no API costs
Speaker Diarization - Automatically identifies and labels different speakers
Multi-format Support - MP4, MP3, WAV, WebM, MKV, and more
Multi-language - Auto-detection and translation using MarianMT
Semantic Search - Find content by meaning with vector embeddings
Visual Search - CLIP-powered search by describing what you see
Audio Analysis - Detect laughter, applause, music, and emotions
RAG Chat - Ask questions about your video with context-aware answers
Background Jobs - Queue large files for async processing
Real-time Updates - Live progress via Supabase
Share Links - Generate public links to share results
Subtitle Export - WebVTT and SRT with translation support
Multiple LLMs - Ollama (local), Groq, OpenAI, Anthropic, Grok

Tech Stack

Layer	Technologies
Frontend	React 19, TypeScript, Vite, TailwindCSS, React Query
Backend	FastAPI, Faster Whisper, PyTorch, Pyannote, ChromaDB
Infrastructure	Supabase, Google Cloud (Run, Storage, Firestore), Netlify

Quick Start

Prerequisites

Node.js 18+, Python 3.9+, FFmpeg
HuggingFace token (for speaker diarization)

Backend

cd backend
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env  # Edit with your settings
uvicorn main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173

Architecture

flowchart TB
    subgraph Frontend["Frontend (React)"]
        UI[UI] --> API[API Client]
        API --> RT[Supabase Realtime]
    end

    subgraph Cloud["Cloud Services"]
        GCS[(GCS)]
        SB[(Supabase)]
        FS[(Firestore)]
    end

    subgraph Backend["Backend (FastAPI)"]
        TR[Transcription] --> WH[Whisper]
        SR[Speaker] --> PY[Pyannote]
        CR[Chat] --> VDB[(ChromaDB)]
        CR --> LLM[LLM Providers]
    end

    API --> TR & SR & CR
    RT <--> SB
    TR --> FS
    TR --> GCS

Troubleshooting

Issue	Solution
`No module named 'torch'`	Activate venv: `source venv/bin/activate`
`FFmpeg not found`	Install: `brew install ffmpeg` (macOS) or `apt install ffmpeg`
Speaker diarization fails	Check `HUGGINGFACE_TOKEN` and accept pyannote terms
Ollama connection error	Start Ollama: `ollama serve`
Large file upload fails	Enable GCS: `ENABLE_GCS_UPLOADS=true`

Documentation

Configuration Guide - All environment variables
API Reference - Complete endpoint documentation
Architecture - Detailed system diagrams
Speaker Diarization Setup
Production Deployment

Project Structure

ai-subs/
├── frontend/          # React + TypeScript
│   ├── src/
│   │   ├── components/
│   │   ├── hooks/
│   │   ├── services/
│   │   └── types/
│   └── package.json
├── backend/           # FastAPI + ML
│   ├── routers/       # API endpoints
│   ├── services/      # Business logic
│   ├── models/        # Pydantic schemas
│   └── main.py
└── docs/              # Documentation

Contributing

Contributions welcome! Please open issues or submit pull requests.

Acknowledgments

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
COST_OPTIMIZATION_PLANS.md		COST_OPTIMIZATION_PLANS.md
README.md		README.md
extensions.json		extensions.json
netlify.toml		netlify.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Subtitles

Features

Tech Stack

Quick Start

Prerequisites

Backend

Frontend

Architecture

Troubleshooting

Documentation

Project Structure

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Subtitles

Features

Tech Stack

Quick Start

Prerequisites

Backend

Frontend

Architecture

Troubleshooting

Documentation

Project Structure

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages