Skip to content

theam/scribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scribe

State-of-the-art local audio transcription with speaker diarization for macOS.

100% local. No cloud. No API keys. No data leaves your machine.

Features

  • Transcription — Accurate speech-to-text powered by NVIDIA Parakeet TDT v3 (via FluidAudio CoreML)
  • Speaker diarization — Identify who said what, powered by pyannote (via FluidAudio CoreML)
  • Apple Silicon optimized — Runs on CoreML and the Apple Neural Engine at 130x real-time
  • Multiple output formats — Plain text, JSON (with word timestamps), SRT, VTT
  • 25 European languages — English, Spanish, French, German, Italian, Portuguese, Russian, and more
  • Fast — Transcribes a 4-minute recording in under 2 seconds

Install

brew install theam/tap/scribe

Or build from source:

git clone https://github.com/theam/scribe.git
cd scribe
swift build -c release
cp .build/release/scribe /usr/local/bin/

Usage

Basic transcription

scribe transcribe meeting.wav

With speaker diarization

scribe transcribe meeting.wav --diarize

Specify number of speakers (recommended for best diarization)

scribe transcribe meeting.wav --diarize --speakers 4

Tip: Providing the expected number of speakers with --speakers significantly improves diarization accuracy. Without it, the automatic speaker count detection works well for most recordings but may slightly over- or under-segment when voices are similar. If you know how many people were in the meeting, always pass --speakers.

Output formats

scribe transcribe meeting.wav --format txt    # plain text (default)
scribe transcribe meeting.wav --format json   # structured JSON with word timestamps
scribe transcribe meeting.wav --format srt    # SRT subtitles
scribe transcribe meeting.wav --format vtt    # WebVTT subtitles

Save to file

scribe transcribe meeting.wav --format json --output transcript.json

Force a language

scribe transcribe meeting.wav --language es    # Spanish
scribe transcribe meeting.wav --language fr    # French

Pre-download models

scribe models download all    # download ASR + diarization models for offline use

Output Examples

Plain text with diarization

[00:03] Speaker 1: Hello, how are you?
[00:06] Speaker 1: I forgot a few points.
[00:27] Speaker 2: Let's see if Claude is right about you.
[00:32] Speaker 3: Oh my gosh, here comes the song. My favorite.

JSON

{
  "metadata": {
    "duration": 226.1,
    "diarization": true
  },
  "segments": [
    {
      "start": 3.2,
      "end": 4.7,
      "text": "Hello, how are you?",
      "speaker": "Speaker 1",
      "words": [
        { "start": 3.2, "end": 3.6, "text": "Hello," },
        { "start": 3.6, "end": 3.9, "text": "how" },
        { "start": 3.9, "end": 4.2, "text": "are" },
        { "start": 4.2, "end": 4.7, "text": "you?" }
      ]
    }
  ]
}

Performance

Tested on Apple Silicon (M-series):

Task Speed Example
Transcription only ~130x real-time 4-min file in 1.7s
Transcription + diarization ~30x real-time 4-min file in 7.5s

Models are downloaded automatically on first use (~600MB for ASR, ~50MB for diarization).

Requirements

  • macOS 14 (Sonoma) or later
  • Apple Silicon (M1 or later)

Supported Languages

Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian.

Acknowledgments

scribe is built on the shoulders of excellent open-source projects:

  • NVIDIA Parakeet (CC-BY-4.0) — The speech recognition model that powers transcription
  • FluidAudio (Apache 2.0) by FluidInference — CoreML speech processing SDK for Apple Silicon
  • pyannote.audio (MIT) by Herve Bredin — The diarization model architecture
  • swift-argument-parser (Apache 2.0) by Apple — CLI argument parsing

License

Apache 2.0 — Copyright 2026 The Agile Monkeys Inc. See LICENSE.

About

State-of-the-art local audio transcription with speaker diarization for macOS

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages