macOS background app for system-wide voice-to-text via hotkey using local Whisper.
Hold a hotkey, speak, release → text inserted at cursor. Privacy-first (100% local, no cloud).
Key Features:
- 🎯 Multi-profile support - Different models per hotkey (e.g., fast for notes, accurate for emails)
- 🔄 Smart aliases - Fuzzy text replacement (e.g., "my email" → "user@example.com")
- 📊 Menubar tray - Visual feedback (idle/recording/processing) and quick config access
- 🔒 100% local - No cloud, no internet (except initial model download)
- ⚡ Fast - <50ms audio start, ~2s transcription (10s audio)
Easiest installation - no code signing issues:
brew install Automaat/whisper-hotkey/whisper-hotkeyFirst launch:
- Open WhisperHotkey from Applications
- Grant permissions when prompted:
- Accessibility
- Input Monitoring
- Microphone
- App downloads Whisper model (~466MB)
Usage:
- Default hotkey:
Ctrl+Option+Z - Press and hold → speak → release
- Text appears at cursor
Configuration: ~/.whisper-hotkey/config.toml
Update: brew upgrade whisper-hotkey
For users without Homebrew:
- Download: Latest release →
WhisperHotkey-*.dmg - Install: Open DMG, drag
WhisperHotkey.apptoApplications - Remove quarantine (if needed):
xattr -d com.apple.quarantine /Applications/WhisperHotkey.app
- Run: Open from Applications
- Permissions: Grant Microphone + Accessibility + Input Monitoring when prompted
- First run: Downloads Whisper model (~466MB)
Prerequisites:
- macOS (M1/M2 or Intel)
- Rust/Cargo (install:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh) - mise (optional, recommended:
curl https://mise.run | sh) - Permissions: Microphone + Accessibility
# Clone repo
git clone https://github.com/Automaat/whisper-hotkey.git
cd whisper-hotkey
# Run installer (binary mode - installs to /usr/local/bin)
./scripts/install.sh
# OR .app bundle mode (installs to /Applications)
./scripts/install.sh appThe installer will:
- Build release binary
- Install to
/usr/local/binor/Applications - Create config:
~/.whisper-hotkey/config.toml - Optionally setup auto-start at login (LaunchAgent)
To uninstall:
./scripts/uninstall.sh# Clone repo
git clone https://github.com/Automaat/whisper-hotkey.git
cd whisper-hotkey
# Install Rust toolchain (if using mise)
mise install
# Build (downloads ~466MB Whisper model on first run)
mise exec -- cargo build --release
# OR without mise:
cargo build --release
# Run
mise exec -- cargo run --release
# OR:
./target/release/whisper-hotkeyFirst run:
- Creates config:
~/.whisper-hotkey/config.toml - Prompts for Microphone permission (System Settings → Privacy & Security)
- Prompts for Accessibility permission (for hotkey + text insertion)
- Downloads Whisper model:
~/.whisper-hotkey/models/ggml-small.bin(~466MB) - Loads model (takes 2-3s)
Console output: Real-time logs show activity:
🎤 Hotkey pressed - recording started
⏹️ Hotkey released - processing audio
📼 Captured 3.5s audio (56000 samples)
✨ Transcription: "Hello, this is a test"
✅ Inserted 22 chars
✓ Ready for next recording
- Open any text editor (TextEdit, VS Code, Notes, Chrome)
- Click into a text field
- Press and hold
Ctrl+Option+Z - Speak clearly: "Hello, this is a test"
- Release the hotkey
- Text appears at cursor in ~2s
Expected output:
✓ Config loaded from ~/.whisper-hotkey/config.toml
✓ Telemetry initialized
✓ Permissions OK
✓ Model found at /Users/you/.whisper-hotkey/models/ggml-small.bin
Loading Whisper model (this may take a few seconds)...
Optimization: 4 threads, beam_size=5
✓ Whisper model loaded and ready
✓ Audio capture initialized
✓ Hotkey registered: ["Control", "Option"] + Z
Whisper Hotkey is running. Press the hotkey to record and transcribe.
✓ Full pipeline ready: hotkey → audio → transcription → text insertion
Press Ctrl+C to exit.
Edit ~/.whisper-hotkey/config.toml:
# Multi-profile support - define multiple hotkeys with different models
[[profiles]]
name = "Fast" # optional profile name
model_type = "small" # tiny, base, small, medium, large, tiny.en, base.en, small.en, medium.en
[profiles.hotkey]
modifiers = ["Control", "Option"]
key = "Z"
preload = true # load on startup (recommended)
threads = 4 # CPU threads (try 2/4/8)
beam_size = 1 # 1=fast, 5=balanced, 10=accurate
language = "en" # optional language hint
[[profiles]]
name = "Accurate"
model_type = "medium"
[profiles.hotkey]
modifiers = ["Command", "Shift"]
key = "V"
threads = 4
beam_size = 5
language = "en"
[audio]
buffer_size = 1024 # frames (leave default)
sample_rate = 16000 # Hz (leave default)
[telemetry]
enabled = true # local crash logging only
log_path = "~/.whisper-hotkey/crash.log"
[recording]
enabled = true # save debug recordings
retention_days = 7 # auto-delete after N days
max_count = 100 # keep max N recordings
cleanup_interval_hours = 24 # cleanup frequency
[aliases]
enabled = true # fuzzy text replacement
threshold = 0.85 # match threshold (0.0-1.0)
[aliases.entries]
"my email" = "user@example.com"
"my address" = "123 Main St, City, State 12345"
"github" = "https://github.com/username"After editing: Restart app (Ctrl+C, then cargo run --release)
Note: You can define multiple profiles with different models and hotkeys. The tray icon menu shows all active profiles.
To run whisper-hotkey automatically when you log in:
# Setup LaunchAgent (starts now and at every login)
./scripts/setup-launchagent.shManage the service:
# Stop service
launchctl unload ~/Library/LaunchAgents/com.whisper-hotkey.plist
# Start service
launchctl load ~/Library/LaunchAgents/com.whisper-hotkey.plist
# Restart service
launchctl kickstart -k gui/$(id -u)/com.whisper-hotkey
# Check status
launchctl list | grep whisper-hotkey
# View logs
tail -f ~/.whisper-hotkey/stdout.log
tail -f ~/.whisper-hotkey/stderr.logTo disable auto-start:
launchctl unload ~/Library/LaunchAgents/com.whisper-hotkey.plist
rm ~/Library/LaunchAgents/com.whisper-hotkey.plist[model]
threads = 8
beam_size = 1[model]
threads = 4
beam_size = 10[model]
model_type = "tiny" # Faster, less accurate (~75MB)
# or
model_type = "base" # Good balance (~142MB)
# or
model_type = "medium" # More accurate, slower (~1.5GB)App auto-downloads model on next run.
Symptoms: You've granted Microphone and Accessibility permissions, but the app still can't access them.
Cause: macOS quarantine attribute (applied to downloaded apps) prevents the system from recognizing granted permissions.
Solution:
- Open Terminal
- Run this command (replace path if needed):
xattr -d com.apple.quarantine /Applications/WhisperHotkey.app
- Restart the app
Note: The app will detect quarantine on startup and show this command automatically.
- Grant Microphone permission: System Settings → Privacy & Security → Microphone
- Reset:
tccutil reset Microphone, then restart app
- Grant Accessibility permission: System Settings → Privacy & Security → Accessibility
- Add Terminal/iTerm to allowed apps
- Check Accessibility permission (same as above)
- Some apps block insertion (Terminal secure input mode)
- Check logs:
tail -f ~/.whisper-hotkey/crash.log
- Try faster config (threads=8, beam_size=1)
- Use smaller model (tiny or base)
- Check logs for
inference_msmetric
- Manual download from Hugging Face
- Place in:
~/.whisper-hotkey/models/ggml-{name}.bin
For maintainers:
# Auto-increment minor version (0.0.0 → 0.1.0)
./scripts/create-release.sh
# Specific version
./scripts/create-release.sh 0.1.0
# Or manually via GitHub CLI
gh workflow run release.yml # Auto-increment
gh workflow run release.yml -f version=0.1.0 # Specific versionThe release workflow:
- Creates and pushes git tag (e.g.,
v0.1.0) - Builds release binary with optimizations
- Creates .app bundle and DMG
- Generates SHA256 checksum
- Publishes GitHub release with artifacts
Monitor release:
gh run watch
# OR
gh run list --workflow=release.ymlReleases appear at: GitHub Releases
# Unit tests (no hardware required)
mise exec -- cargo test
# Hardware tests (requires mic + permissions)
mise exec -- cargo test -- --ignored# Default: info level (shows hotkey events, transcription results)
cargo run --release
# Debug: detailed timing information
RUST_LOG=debug cargo run --release
# Trace: everything including low-level operations
RUST_LOG=whisper_hotkey=trace cargo run --release
# CPU profiling
sudo cargo flamegraph --release
# Trigger hotkey, Ctrl+C, then: open flamegraph.svg
# Memory profiling (macOS)
instruments -t Allocations target/release/whisper-hotkeySee TESTING.md for comprehensive profiling guide.
mise exec -- cargo fmt
mise exec -- cargo clippy- Hotkey pressed → Clear audio buffer, start recording
- Hotkey held → Accumulate audio samples (16kHz mono)
- Hotkey released → Stop recording, convert audio format
- Transcription → Whisper processes audio (~2s for 10s recording)
- Text insertion → CGEvent inserts text at cursor
Tech stack:
- Rust 1.84
- Whisper.cpp (via whisper-rs bindings)
- cpal (audio capture)
- global-hotkey (hotkey detection)
- Core Graphics CGEvent (text insertion)
- tray-icon (menubar integration)
Define multiple transcription profiles with different models and hotkeys:
- Fast profile (Ctrl+Option+Z): Use
smallmodel for quick notes - Accurate profile (Cmd+Shift+V): Use
mediummodel for emails
Each profile can have different:
- Model type (tiny → large)
- Hotkey combination
- Inference settings (threads, beam_size)
- Language hints
The tray icon menu displays all active profiles with their hotkeys and models.
Replace transcribed text with predefined values using fuzzy matching:
[aliases]
enabled = true
threshold = 0.85 # 0.0 (loose) to 1.0 (exact)
[aliases.entries]
"my email" = "john.doe@company.com"
"office address" = "123 Main St, Suite 400, San Francisco, CA 94105"
"github profile" = "https://github.com/username"How it works:
- Case-insensitive fuzzy matching (Jaro-Winkler algorithm)
- Say "my email" → automatically replaced with your configured email
- Handles pronunciation variations (e.g., "office address" vs "office adress")
- Best match wins if multiple aliases are close
Use cases:
- Email addresses / phone numbers
- Physical addresses
- URLs / code snippets
- Company names / product names
Visual feedback and quick access:
- Adaptive icon (idle): Black on light mode, white on dark mode
- Red icon (recording): Shows when hotkey is pressed
- Yellow icon (processing): Shows during transcription
- Menu: Lists all profiles, "Open Config File", "Quit"
- Retina support: Automatically uses high-DPI icons
Optionally save audio recordings for debugging:
[recording]
enabled = true # Save recordings to ~/.whisper-hotkey/debug/
retention_days = 7 # Auto-delete after N days
max_count = 100 # Keep max N most recent recordings
cleanup_interval_hours = 24 # Cleanup frequencyRecordings named: recording_{timestamp}.wav
Use cases:
- Debug transcription accuracy issues
- Compare different model performance
- Report bugs with audio samples
- 100% local: No cloud, no internet required (except model download)
- No telemetry: Only local crash logs (
~/.whisper-hotkey/crash.log) - No storage: Audio discarded after transcription
- macOS only (uses Core Graphics, Accessibility APIs)
- No real-time streaming (Whisper design limitation)
- No App Store (requires Accessibility, no sandbox)
- Some apps resist text insertion (Terminal secure input, etc.)
| Metric | Target | Actual (M1, small model) |
|---|---|---|
| Audio start | <50ms | ~5-10ms |
| Transcription (10s) | <2s | ~1.5-2s |
| Text insertion | <100ms | ~20-50ms |
| Idle CPU | <1% | ~0.5% |
| Idle RAM | ~1.5GB | ~1.3GB |
- Phase 1: Foundation (config, telemetry, permissions)
- Phase 2: Global hotkey
- Phase 3: Audio recording
- Phase 4: Whisper integration
- Phase 5: Text insertion
- Phase 6: Integration & polish
- Phase 7: Optimization & testing
- Phase 8: Distribution (.app bundle, installer)
See implem-plan.md for detailed implementation plan.
MIT
PRs welcome! Please:
- Run
cargo fmtandcargo clippybefore submitting - Add tests for new features
- Update TESTING.md for profiling changes
- Issues: https://github.com/Automaat/whisper-hotkey/issues
- Docs: See TESTING.md for profiling/debugging
- Implementation: See implem-plan.md
