Transcriber

AI-powered local meeting transcription with automatic speaker identification. Upload an audio file, record in the browser, or run a live session - and get a full transcript with speakers identified by name.

How it works

Upload, record, or go live through the web UI
Audio extraction - FFmpeg converts to 16kHz mono WAV
Transcription - whisper.cpp with KB-LAB Swedish models (Metal GPU accelerated)
Speaker diarization - pyannote.audio 3.1 separates speakers
Intro analysis - LLM iteratively reads the transcript to detect introductions and count speakers
Speaker identification - Names matched to voices using LLM reasoning + SpeechBrain voice embeddings
Results - Color-coded transcript synced with audio playback, editable segments, AI-powered actions, export to 7 formats

See FEATURES.md for a complete feature list.

Architecture

Browser ─── React/Vite ──┐
                         ├── FastAPI ─── Celery Worker ─── whisper.cpp (Metal GPU)
                         │      │              │
                         │   WebSocket     pyannote.audio
                         │   (progress)    SpeechBrain
                         │      │           LLM (Ollama/OpenRouter)
                         │      │
                     PostgreSQL  Redis
                      (data)   (queue + pubsub)

Hybrid setup: PostgreSQL and Redis run in Docker. Python backend and Celery worker run natively on macOS for Metal GPU access.

Platform guides

The instructions below are for macOS with Apple Silicon. For other platforms:

Prerequisites

macOS with Apple Silicon (for Metal GPU acceleration)
Docker and Docker Compose
Python 3.11+
Node.js 18+
FFmpeg (brew install ffmpeg)
whisper.cpp compiled with Metal support
Ollama with a model like qwen3:8b (recommended), or an OpenRouter API key
Hugging Face token with access to pyannote/speaker-diarization-3.1

Quick install

git clone https://github.com/fltman/transcriber.git
cd transcriber
bash install.sh   # macOS/Linux automated installer
bash start.sh     # Start all services

On Windows, use install.ps1 and start.ps1 instead (see Windows guide).

The installer checks prerequisites, builds whisper.cpp, downloads models, sets up Python/Node dependencies, starts Docker, and creates the .env file. You only need to add your Hugging Face token afterwards.

Manual installation

1. Clone the repo

git clone https://github.com/fltman/transcriber.git
cd transcriber

2. Build whisper.cpp with Metal support

git clone https://github.com/ggerganov/whisper.cpp.git ../whisper.cpp
cd ../whisper.cpp
cmake -B build -DWHISPER_METAL=ON
cmake --build build --config Release
cd ../transcriber

3. Download Whisper models

Download the KB-LAB Swedish GGML models:

mkdir -p models
# Medium model (main transcription, higher quality)
curl -L -o models/kb_whisper_ggml_medium.bin \
  https://huggingface.co/KBLab/kb-whisper-medium/resolve/main/ggml-model.bin

# Small model (live transcription, faster)
curl -L -o models/kb_whisper_ggml_small.bin \
  https://huggingface.co/KBLab/kb-whisper-small/resolve/main/ggml-model.bin

4. Start PostgreSQL and Redis

docker-compose up -d

This starts:

PostgreSQL on port 5433
Redis on port 6380

5. Create the .env file

cat > .env << 'EOF'
DATABASE_URL=postgresql://transcriber:transcriber@localhost:5433/transcriber
REDIS_URL=redis://localhost:6380/0

# LLM provider: "ollama" or "openrouter"
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen3:8b

# Alternative: OpenRouter (uncomment and fill in)
# LLM_PROVIDER=openrouter
# OPENROUTER_API_KEY=your_key_here
# OPENROUTER_MODEL=anthropic/claude-sonnet-4

# Paths to whisper.cpp (adjust to your setup)
WHISPER_CLI_PATH=../whisper.cpp/build/bin/whisper-cli
WHISPER_MODEL_PATH=./models/kb_whisper_ggml_medium.bin
WHISPER_SMALL_MODEL_PATH=./models/kb_whisper_ggml_small.bin

STORAGE_PATH=./storage

# Hugging Face token (needed for pyannote.audio speaker diarization)
# Get yours at https://huggingface.co/settings/tokens
# You must accept the model terms at https://huggingface.co/pyannote/speaker-diarization-3.1
HF_AUTH_TOKEN=hf_your_token_here
EOF

Edit the file and fill in your actual paths and tokens.

6. Set up the Python backend

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

7. Set up the frontend

cd frontend
npm install
cd ..

8. Set up Ollama (if using local LLM)

# Install Ollama from https://ollama.com
ollama pull qwen3:8b

Running

Start all services. You need four terminals (or use & to background them):

# Terminal 1 - Backend API
source venv/bin/activate
uvicorn main:app --port 8000 --reload

# Terminal 2 - Celery worker (background processing)
source venv/bin/activate
celery -A tasks.celery_app worker --loglevel=info --pool=solo

# Terminal 3 - Frontend
cd frontend
npm run dev

# Terminal 4 - Ollama (if using local LLM)
ollama serve

Open http://localhost:5174 in your browser.

Usage

Click New transcription on the home page
Choose Upload, Record, or Live
- Upload: drag-and-drop or browse for an audio/video file
- Record: select your microphone (or system audio) and record
- Live: start a real-time transcription session
Enter a title and click Start
For uploaded files, click Start transcription on the meeting page
Watch real-time progress as the pipeline runs
Browse the transcript with synced audio playback
Click speaker names to rename, click segments to edit text
Run Actions (summarize, action items, etc.) from the sidebar
Export to SRT, WebVTT, TXT, Markdown, JSON, DOCX, or PDF

Project structure

transcriber/
├── main.py                    # FastAPI app entry point
├── config.py                  # Pydantic settings
├── database.py                # SQLAlchemy + migrations
├── model_config.py            # Model preset manager
├── docker-compose.yml         # PostgreSQL + Redis
├── model_presets/             # AI model configurations
├── api/
│   ├── meetings.py            # Upload, CRUD, process
│   ├── live_websocket.py      # Live transcription WebSocket
│   ├── speakers.py            # Rename, merge speakers
│   ├── segments.py            # Edit transcript text
│   ├── export.py              # 7-format export
│   ├── actions.py             # Custom LLM actions
│   ├── encryption.py          # Encrypt/decrypt meetings
│   └── model_settings.py      # Model preset API
├── services/
│   ├── audio_service.py       # FFmpeg extraction
│   ├── whisper_service.py     # whisper-cli wrapper
│   ├── diarization_service.py # pyannote pipeline
│   ├── embedding_service.py   # SpeechBrain ECAPA-TDNN
│   ├── speaker_id_service.py  # Name matching logic
│   ├── llm_service.py         # Ollama / OpenRouter
│   └── encryption_service.py  # Fernet encryption
├── tasks/
│   ├── celery_app.py          # Celery config
│   ├── process_meeting.py     # Main transcription pipeline
│   ├── action_task.py         # LLM action execution
│   ├── polish_task.py         # Live speaker refinement
│   ├── finalize_task.py       # Live post-processing
│   └── shared.py              # Shared task utilities
├── models/                    # SQLAlchemy models
│   ├── meeting.py
│   ├── speaker.py
│   ├── segment.py
│   ├── job.py
│   └── action.py
└── frontend/                  # React + TypeScript
    └── src/
        ├── App.tsx
        ├── store.ts           # Zustand state
        ├── pages/
        │   ├── HomePage.tsx
        │   └── MeetingPage.tsx
        ├── components/
        │   ├── TranscriptView.tsx
        │   ├── SpeakerPanel.tsx
        │   ├── AudioPlayer.tsx
        │   ├── AudioSourceSelect.tsx
        │   ├── ActionsPanel.tsx
        │   ├── ProgressTracker.tsx
        │   ├── ExportDialog.tsx
        │   ├── EncryptDialog.tsx
        │   ├── DecryptDialog.tsx
        │   ├── LiveRecordingBar.tsx
        │   └── SettingsDialog.tsx
        └── hooks/
            └── useLiveRecording.ts

Tech stack

Layer	Technology
Frontend	React 18, TypeScript, Vite, Tailwind CSS, Zustand
Backend	FastAPI, SQLAlchemy, Celery
Transcription	whisper.cpp with KB-LAB Swedish models
Diarization	pyannote.audio 3.1
Voice embeddings	SpeechBrain ECAPA-TDNN
LLM	Ollama (qwen3:8b) or OpenRouter (Claude Sonnet 4)
Infrastructure	PostgreSQL, Redis, Docker Compose
Media	FFmpeg

API endpoints

Method	Endpoint	Description
`POST`	`/api/meetings`	Upload audio file
`POST`	`/api/meetings/live`	Create live session
`GET`	`/api/meetings`	List meetings
`GET`	`/api/meetings/{id}`	Get meeting with transcript
`DELETE`	`/api/meetings/{id}`	Delete meeting
`POST`	`/api/meetings/{id}/process`	Start transcription pipeline
`GET`	`/api/meetings/{id}/audio`	Stream audio
`GET`	`/api/meetings/{id}/export?format=srt`	Export transcript
`PUT`	`/api/segments/{id}`	Edit segment text
`PUT`	`/api/speakers/{id}`	Rename/recolor speaker
`POST`	`/api/speakers/merge`	Merge two speakers
`GET`	`/api/actions`	List actions
`POST`	`/api/actions`	Create custom action
`POST`	`/api/actions/{id}/run`	Run action on meeting
`GET`	`/api/actions/results/{id}/export`	Export action result
`POST`	`/api/meetings/{id}/encrypt`	Encrypt meeting
`POST`	`/api/meetings/{id}/decrypt`	Decrypt meeting
`GET`	`/api/model-settings/presets`	List model presets
`GET`	`/api/model-settings/assignments`	Get model assignments
`PUT`	`/api/model-settings/assignments`	Update model assignments
`WS`	`/ws/meetings/{id}`	Progress updates
`WS`	`/ws/live/{id}`	Live transcription stream

Author

Anders Bjarby

Web: anders.bjarby.com
Email: anders@brattoo.com

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcriber

How it works

Architecture

Platform guides

Prerequisites

Quick install

Manual installation

1. Clone the repo

2. Build whisper.cpp with Metal support

3. Download Whisper models

4. Start PostgreSQL and Redis

5. Create the .env file

6. Set up the Python backend

7. Set up the frontend

8. Set up Ollama (if using local LLM)

Running

Usage

Project structure

Tech stack

API endpoints

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
api		api
docker		docker
frontend		frontend
model_presets		model_presets
models		models
services		services
tasks		tasks
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
FEATURES.md		FEATURES.md
FEATURES.txt		FEATURES.txt
FEATURES_SV.md		FEATURES_SV.md
FEATURES_SV.txt		FEATURES_SV.txt
INSTALL_LINUX.md		INSTALL_LINUX.md
INSTALL_WINDOWS.md		INSTALL_WINDOWS.md
README.md		README.md
config.py		config.py
database.py		database.py
docker-compose.yml		docker-compose.yml
install.ps1		install.ps1
install.sh		install.sh
main.py		main.py
model_config.py		model_config.py
preferences.py		preferences.py
requirements.txt		requirements.txt
start.ps1		start.ps1
start.sh		start.sh
ws_manager.py		ws_manager.py

Folders and files

Latest commit

History

Repository files navigation

Transcriber

How it works

Architecture

Platform guides

Prerequisites

Quick install

Manual installation

1. Clone the repo

2. Build whisper.cpp with Metal support

3. Download Whisper models

4. Start PostgreSQL and Redis

5. Create the .env file

6. Set up the Python backend

7. Set up the frontend

8. Set up Ollama (if using local LLM)

Running

Usage

Project structure

Tech stack

API endpoints

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages