A fast, zero-rewrite dictation app for macOS — your words, exactly as spoken.
Next Step: Want to build your own AI-powered tools? Check out the Agent Skills Resource Library (includes slides, PDF, diagnostics)
I tried several popular dictation apps on macOS. They all share the same problems:
- They rewrite your words. You say one thing, and the tool "helpfully" rephrases it. You lose trust in your own input.
- The hotkey gets stuck. You release the key, but the app keeps recording. You have to press again — or restart.
- They answer instead of typing. AI-powered dictation tools sometimes treat your speech as a question and respond to it, instead of just typing what you said.
- Your audio goes through a black box. Closed-source, no visibility into what's sent where.
VerbatimFlow exists because I wanted a dictation tool I could actually trust: one that types what I say, releases when I release, never "helps" without asking, and runs on code I can read.
VerbatimFlow is a menu bar dictation utility that transcribes speech and injects text directly into any focused app.
Core Principle: Raw transcription first. Cleanup is opt-in and constrained.
- Push-to-talk — hold a hotkey to record, release to transcribe and inject
- Two modes —
Standard(verbatim output with rule-based formatting: punctuation, spacing, capitalization) andClarify(LLM-powered concise rewrite, opt-in) - Multiple engines — Apple Speech, local Whisper, OpenAI Cloud
- Instant injection — text appears in your active app via Accessibility API
- Undo support — one-click rollback of the last inserted transcript
- Open source — every line of code is readable; your audio, your control
Status: Alpha
- This is a working prototype that I use daily, but it has rough edges.
- My primary focus is demonstrating how voice input can work without over-editing, not maintaining this codebase.
- If you encounter issues, please submit a reproducible case (input + output + steps to reproduce).
- Menu bar app — lives in the macOS menu bar as a V-mark icon with real-time state badges (● recording, ○ processing, — paused)
- Dual hotkey — primary hotkey uses current mode; secondary hotkey (
Cmd+Shift+Space) forces Clarify for one segment - Engine switching — Apple Speech / Whisper (tiny–large-v3) / OpenAI Cloud (gpt-4o-mini-transcribe, whisper-1)
- Clarify via OpenAI or OpenRouter — configurable provider, model, and API keys
- Terminology dictionary — custom term corrections and source→target substitution rules
- Language selection — System Default / zh-Hans / en-US
- Transcript history — recent transcripts viewable in menu, with Copy + Undo Last Insert
- Permission diagnostics — built-in permission snapshot and one-click system settings access
- Persistent preferences — mode, engine, model, hotkey, and language survive restarts
- Deterministic code signing — stable bundle ID prevents permission invalidation across rebuilds
- macOS 14+ (Sonoma or later recommended)
- Xcode 16+ (for building from source)
- Microphone and Accessibility permissions
git clone https://github.com/axtonliu/verbatim-flow.git
cd verbatim-flow
# Build .app bundle
./scripts/build-native-app.sh
open "apps/mac-client/dist/VerbatimFlow.app"./scripts/build-installer-dmg.sh
open "apps/mac-client/dist/VerbatimFlow-installer.dmg"The DMG provides drag-and-drop installation to /Applications.
- Launch — double-click
VerbatimFlow.appor run./scripts/run-native-mac-client.sh - Grant permissions — Microphone, Accessibility, and Speech Recognition (prompted on first launch, or use menu shortcuts)
- Hold hotkey — default
Ctrl+Shift+Spaceto record; release to transcribe and inject - Switch modes — use the Settings menu to toggle between Standard and Clarify
- Force Clarify — press
Cmd+Shift+Spaceto use Clarify mode for one segment regardless of default
Switch hotkey presets from the Settings menu without restarting:
| Preset | Hotkey |
|---|---|
| Default | Ctrl+Shift+Space |
| Option+Space | Option+Space |
| Fn | Fn |
Cloud transcription and Clarify rewrite are configured via ~/Library/Application Support/VerbatimFlow/openai.env:
# OpenAI transcription
OPENAI_API_KEY=sk-...
VERBATIMFLOW_OPENAI_MODEL=gpt-4o-mini-transcribe
# Auto route (fast first pass, then high-accuracy fallback for risky mixed-language terms)
VERBATIMFLOW_OPENAI_AUTO_ROUTE=1
VERBATIMFLOW_OPENAI_AUTO_SECONDARY_MODEL=whisper-1
# Optional tuning:
# VERBATIMFLOW_OPENAI_AUTO_ROUTE_ZH_ONLY=1
# VERBATIMFLOW_OPENAI_LANGUAGE_HINT_MODE=auto
# VERBATIMFLOW_OPENAI_AUTO_ROUTE_MIN_RISK_SCORE=2
# VERBATIMFLOW_OPENAI_AUTO_ROUTE_MIN_PRIMARY_CHARS=8
# Clarify provider: openai or openrouter
VERBATIMFLOW_CLARIFY_PROVIDER=openai
VERBATIMFLOW_OPENAI_CLARIFY_MODEL=gpt-4o-mini
# OpenRouter alternative
# VERBATIMFLOW_CLARIFY_PROVIDER=openrouter
# OPENROUTER_API_KEY=...
# VERBATIMFLOW_OPENAI_CLARIFY_MODEL=openai/gpt-4o-miniEdit this file directly or via the menu bar: Settings → Open Cloud Settings.
Custom term corrections at ~/Library/Application Support/VerbatimFlow/terminology.txt:
# Simple term corrections
VerbatimFlow
macOS
OpenAI
# Substitution rules (source => target)
verbal flow => VerbatimFlow
mac OS => macOS
~/Library/Logs/VerbatimFlow/runtime.logverbatim-flow/
├── apps/mac-client/
│ ├── Sources/VerbatimFlow/ # Native Swift app
│ │ ├── main.swift # Entry point
│ │ ├── MenuBarApp.swift # Menu bar UI
│ │ ├── AppController.swift # Core orchestration
│ │ ├── HotkeyMonitor.swift # Global hotkey handling
│ │ ├── SpeechTranscriber.swift
│ │ ├── TextInjector.swift # Accessibility-based injection
│ │ ├── TextGuard.swift # Format-only diff guard
│ │ ├── ClarifyRewriter.swift
│ │ ├── TerminologyDictionary.swift
│ │ └── ...
│ ├── Tests/VerbatimFlowTests/ # Unit tests
│ ├── Package.swift
│ └── dist/ # Build output (.app, .dmg)
├── packages/ # Shared package stubs
│ ├── asr-pipeline/
│ ├── text-guard/
│ ├── text-injector/
│ └── shared/
├── scripts/
│ ├── build-native-app.sh # Build .app bundle
│ ├── build-installer-dmg.sh # Build installer DMG
│ ├── restart-native-app.sh # Kill + relaunch
│ ├── collect-permission-diagnostics.sh
│ ├── run-mac-client.sh # Run Python MVP
│ └── run-native-mac-client.sh # Run native Swift
├── docs/
│ └── ARCHITECTURE.md
├── package.json
├── pnpm-workspace.yaml
├── LICENSE
└── README.md
- Microphone not working: System Settings → Privacy & Security → Microphone → ensure VerbatimFlow is checked. Use menu: Settings → Request Microphone Permission.
- Text not injecting: System Settings → Privacy & Security → Accessibility → add VerbatimFlow. The app uses a stable bundle ID (
com.verbatimflow.app) so permissions persist across rebuilds. - Permission appears granted but still fails: Try removing and re-adding the app in System Settings. Run
./scripts/collect-permission-diagnostics.sh 30for detailed diagnostics.
- Hotkey not responding: Check that no other app is capturing the same shortcut. Try switching to a different preset via the Settings menu.
- Menu bar icon shows a pause dash: Hotkey listener is paused. Click Resume Listening in the menu.
- Clarify returns original text: Verify your API key in
openai.env. Check~/Library/Logs/VerbatimFlow/runtime.logfor errors. - Want to use OpenRouter instead: Set
VERBATIMFLOW_CLARIFY_PROVIDER=openrouterand provideOPENROUTER_API_KEYinopenai.env.
- Streaming transcription (word-by-word injection as you speak)
- Whisper engine integration in native Swift path
- Configurable text guard sensitivity threshold
- Per-app mode profiles
- Improved mixed-language (CJK + English) handling
- Clarify structural formatting (e.g., detect action items and render as bullet lists while preserving meaning)
Contributions welcome (low-maintenance project):
- Reproducible bug reports (input + output + steps + environment)
- Documentation improvements
- Small PRs (fixes/docs)
Note: Feature requests may not be acted on due to limited maintenance capacity.
- Apple Speech Framework — on-device speech recognition
- OpenAI Whisper — open-source ASR model
- faster-whisper — CTranslate2-based Whisper inference (Python MVP)
MIT License — see LICENSE for details.
Axton Liu — AI Educator & Creator
- Website: axtonliu.ai
- YouTube: @AxtonLiu
- Twitter/X: @axtonliu
- Agent Skills Resource Library — slides, PDF guides, diagnostics tools
- AI Elite Weekly Newsletter — Weekly AI insights
- Free AI Course — Get started with AI
© AXTONLIU™ & AI 精英学院™ 版权所有
