A free, local macOS voice dictation app inspired by Wispr Flow. Hold ⌥ Option+Space anywhere to record, release to transcribe. Text is pasted into whatever app is focused.
Powered by whisper.cpp — fully offline, no API keys, no subscriptions.
- Global push-to-talk hotkey (⌥ Space)
- Runs entirely on-device (whisper.cpp with Metal GPU acceleration)
- Menu bar app — no Dock icon, stays out of your way
- Clipboard-based text injection — works in any app including browsers and Electron apps
- Swap between Whisper models (tiny → medium) from Settings
- macOS 13 Ventura or later
- Xcode 15 or later
- Git with submodule support
- Homebrew (optional, for downloading models)
cd "path/to/Voice Dictation"
git init
git submodule add https://github.com/ggerganov/whisper.cpp whisper.cpp
git submodule update --init --recursive# From the project root
bash whisper.cpp/models/download-ggml-model.sh tiny.en
# → whisper.cpp/models/ggml-tiny.en.bin (~75 MB)You can also download larger models later from within the app's Settings.
Open Xcode and create a new project:
- Template: macOS → App
- Product Name: VoiceDictation
- Bundle Identifier: com.yourname.VoiceDictation
- Interface: SwiftUI
- Language: Swift
- Deployment Target: macOS 13.0
- Uncheck "Include Tests" for now
Drag all the files from the VoiceDictation/ folder into the Xcode project navigator.
Make sure "Copy items if needed" is unchecked (the files are already in place).
In Xcode:
-
File → New → Target → macOS → Library (Static)
-
Name it
whisper -
Add these source files to the
whispertarget:whisper.cpp/src/whisper.cppwhisper.cpp/ggml/src/ggml.cwhisper.cpp/ggml/src/ggml-alloc.cwhisper.cpp/ggml/src/ggml-backend.cwhisper.cpp/ggml/src/ggml-backend-reg.cwhisper.cpp/ggml/src/ggml-cpu/ggml-cpu.cwhisper.cpp/ggml/src/ggml-cpu/ggml-cpu.cppwhisper.cpp/ggml/src/ggml-metal.m(for GPU acceleration on Apple Silicon)whisper.cpp/ggml/src/ggml-metal.metal(add to Metal shader sources)
-
In the
whispertarget's Build Settings:- Header Search Paths: add
$(SRCROOT)/whisper.cpp/includeand$(SRCROOT)/whisper.cpp/ggml/include - C++ Language Dialect: C++17
- Preprocessor Macros: add
GGML_USE_METAL=1(enables Metal GPU acceleration)
- Header Search Paths: add
-
In the VoiceDictation app target:
- Target Dependencies: add
whisper - Link Binary With Libraries: add the
whisperstatic library - Header Search Paths: add
$(SRCROOT)/whisper.cpp/include - Objective-C Bridging Header:
VoiceDictation/Resources/VoiceDictation-Bridging-Header.h
- Target Dependencies: add
In the VoiceDictation target's Build Settings:
- Set Info.plist File to
VoiceDictation/Resources/Info.plist
Or merge the keys from VoiceDictation/Resources/Info.plist into the Xcode-generated one.
In the VoiceDictation target's Signing & Capabilities:
- Remove the default App Sandbox capability (or uncheck it)
- Set the entitlements file to
VoiceDictation/Resources/VoiceDictation.entitlements
Drag whisper.cpp/models/ggml-tiny.en.bin into Xcode under the Resources group.
Ensure it is added to the Copy Bundle Resources build phase.
Press ⌘R to build and run.
The app will appear in the menu bar as a microphone icon. Click it to open the status popover.
The app requires three permissions. macOS will prompt for Microphone automatically. For the others, click "Grant" in Settings or go to:
| Permission | System Settings path |
|---|---|
| Microphone | Privacy & Security → Microphone |
| Input Monitoring | Privacy & Security → Input Monitoring |
| Accessibility | Privacy & Security → Accessibility |
After granting all three, the ⌥Space hotkey will work in any app.
VoiceDictation/
├── App/
│ ├── VoiceDictationApp.swift — @main entry point
│ └── AppDelegate.swift — lifecycle, wires components together
├── State/
│ └── AppState.swift — central @Observable state machine
├── HotKey/
│ └── HotKeyMonitor.swift — CGEventTap for ⌥Space
├── Audio/
│ ├── AudioRecorder.swift — AVAudioEngine capture + resampling
│ └── AudioBuffer.swift — thread-safe PCM sample accumulator
├── Transcription/
│ ├── WhisperBridge.h/.mm — Obj-C++ wrapper around whisper.cpp C API
│ └── WhisperTranscriber.swift — Swift actor, async transcribe()
├── TextInjection/
│ └── TextInjector.swift — clipboard + Cmd+V injection
├── UI/
│ ├── MenuBarController.swift — NSStatusItem + NSPopover
│ ├── StatusIndicatorView.swift — popover SwiftUI content
│ └── SettingsView.swift — Settings window
├── Models/
│ └── WhisperModelManager.swift — model file management + download
├── Permissions/
│ └── PermissionChecker.swift — checks Microphone/InputMonitoring/Accessibility
└── Resources/
├── Info.plist
├── VoiceDictation.entitlements
└── VoiceDictation-Bridging-Header.h
whisper.cpp/ — git submodule (ggerganov/whisper.cpp)
| Model | Size | Speed (Apple M-series) | Accuracy |
|---|---|---|---|
| tiny.en | 75 MB | ~0.5s | Good |
| base.en | 142 MB | ~1s | Better |
| small.en | 466 MB | ~2-3s | Great |
| medium.en | 1.5 GB | ~5-8s | Best |
The .en variants are English-only but faster than multilingual models.
Download additional models from the app's Settings window.
- You hold ⌥ Space — a
CGEventTapdetects this and suppresses the native key event (preventing "˙" from being typed) AVAudioEnginestarts capturing microphone input, resampled to 16 kHz mono Float32- You release ⌥ Space — recording stops
- The PCM samples are passed to
whisper_full()via the Obj-C++ bridge - The transcribed text is written to the clipboard and ⌘V is simulated via CGEvent
- Text appears in whatever app was focused