State-of-the-art local audio transcription with speaker diarization for macOS.
100% local. No cloud. No API keys. No data leaves your machine.
- Transcription — Accurate speech-to-text powered by NVIDIA Parakeet TDT v3 (via FluidAudio CoreML)
- Speaker diarization — Identify who said what, powered by pyannote (via FluidAudio CoreML)
- Apple Silicon optimized — Runs on CoreML and the Apple Neural Engine at 130x real-time
- Multiple output formats — Plain text, JSON (with word timestamps), SRT, VTT
- 25 European languages — English, Spanish, French, German, Italian, Portuguese, Russian, and more
- Fast — Transcribes a 4-minute recording in under 2 seconds
brew install theam/tap/scribeOr build from source:
git clone https://github.com/theam/scribe.git
cd scribe
swift build -c release
cp .build/release/scribe /usr/local/bin/scribe transcribe meeting.wavscribe transcribe meeting.wav --diarizescribe transcribe meeting.wav --diarize --speakers 4Tip: Providing the expected number of speakers with
--speakerssignificantly improves diarization accuracy. Without it, the automatic speaker count detection works well for most recordings but may slightly over- or under-segment when voices are similar. If you know how many people were in the meeting, always pass--speakers.
scribe transcribe meeting.wav --format txt # plain text (default)
scribe transcribe meeting.wav --format json # structured JSON with word timestamps
scribe transcribe meeting.wav --format srt # SRT subtitles
scribe transcribe meeting.wav --format vtt # WebVTT subtitlesscribe transcribe meeting.wav --format json --output transcript.jsonscribe transcribe meeting.wav --language es # Spanish
scribe transcribe meeting.wav --language fr # Frenchscribe models download all # download ASR + diarization models for offline use[00:03] Speaker 1: Hello, how are you?
[00:06] Speaker 1: I forgot a few points.
[00:27] Speaker 2: Let's see if Claude is right about you.
[00:32] Speaker 3: Oh my gosh, here comes the song. My favorite.
{
"metadata": {
"duration": 226.1,
"diarization": true
},
"segments": [
{
"start": 3.2,
"end": 4.7,
"text": "Hello, how are you?",
"speaker": "Speaker 1",
"words": [
{ "start": 3.2, "end": 3.6, "text": "Hello," },
{ "start": 3.6, "end": 3.9, "text": "how" },
{ "start": 3.9, "end": 4.2, "text": "are" },
{ "start": 4.2, "end": 4.7, "text": "you?" }
]
}
]
}Tested on Apple Silicon (M-series):
| Task | Speed | Example |
|---|---|---|
| Transcription only | ~130x real-time | 4-min file in 1.7s |
| Transcription + diarization | ~30x real-time | 4-min file in 7.5s |
Models are downloaded automatically on first use (~600MB for ASR, ~50MB for diarization).
- macOS 14 (Sonoma) or later
- Apple Silicon (M1 or later)
Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian.
scribe is built on the shoulders of excellent open-source projects:
- NVIDIA Parakeet (CC-BY-4.0) — The speech recognition model that powers transcription
- FluidAudio (Apache 2.0) by FluidInference — CoreML speech processing SDK for Apple Silicon
- pyannote.audio (MIT) by Herve Bredin — The diarization model architecture
- swift-argument-parser (Apache 2.0) by Apple — CLI argument parsing
Apache 2.0 — Copyright 2026 The Agile Monkeys Inc. See LICENSE.