whisper-hotkey

macOS background app for system-wide voice-to-text via hotkey using local Whisper.

Hold a hotkey, speak, release → text inserted at cursor. Privacy-first (100% local, no cloud).

Key Features:

🎯 Multi-profile support - Different models per hotkey (e.g., fast for notes, accurate for emails)
🔄 Smart aliases - Fuzzy text replacement (e.g., "my email" → "user@example.com")
📊 Menubar tray - Visual feedback (idle/recording/processing) and quick config access
🔒 100% local - No cloud, no internet (except initial model download)
⚡ Fast - <50ms audio start, ~2s transcription (10s audio)

Quick Start

Option 1: Homebrew (Recommended)

Easiest installation - no code signing issues:

brew install Automaat/whisper-hotkey/whisper-hotkey

First launch:

Open WhisperHotkey from Applications
Grant permissions when prompted:
- Accessibility
- Input Monitoring
- Microphone
App downloads Whisper model (~466MB)

Usage:

Default hotkey: Ctrl+Option+Z
Press and hold → speak → release
Text appears at cursor

Configuration: ~/.whisper-hotkey/config.toml

Update: brew upgrade whisper-hotkey

Option 2: Download DMG

For users without Homebrew:

Download: Latest release → WhisperHotkey-*.dmg
Install: Open DMG, drag WhisperHotkey.app to Applications

Remove quarantine (if needed):

xattr -d com.apple.quarantine /Applications/WhisperHotkey.app

Run: Open from Applications
Permissions: Grant Microphone + Accessibility + Input Monitoring when prompted
First run: Downloads Whisper model (~466MB)

Option 3: Build from Source

Prerequisites:

macOS (M1/M2 or Intel)
Rust/Cargo (install: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)
mise (optional, recommended: curl https://mise.run | sh)
Permissions: Microphone + Accessibility

3a: Automated Installer

# Clone repo
git clone https://github.com/Automaat/whisper-hotkey.git
cd whisper-hotkey

# Run installer (binary mode - installs to /usr/local/bin)
./scripts/install.sh

# OR .app bundle mode (installs to /Applications)
./scripts/install.sh app

The installer will:

Build release binary
Install to /usr/local/bin or /Applications
Create config: ~/.whisper-hotkey/config.toml
Optionally setup auto-start at login (LaunchAgent)

To uninstall:

./scripts/uninstall.sh

3b: Manual Build & Run

# Clone repo
git clone https://github.com/Automaat/whisper-hotkey.git
cd whisper-hotkey

# Install Rust toolchain (if using mise)
mise install

# Build (downloads ~466MB Whisper model on first run)
mise exec -- cargo build --release
# OR without mise:
cargo build --release

# Run
mise exec -- cargo run --release
# OR:
./target/release/whisper-hotkey

First run:

Creates config: ~/.whisper-hotkey/config.toml
Prompts for Microphone permission (System Settings → Privacy & Security)
Prompts for Accessibility permission (for hotkey + text insertion)
Downloads Whisper model: ~/.whisper-hotkey/models/ggml-small.bin (~466MB)
Loads model (takes 2-3s)

Console output: Real-time logs show activity:

🎤 Hotkey pressed - recording started
⏹️  Hotkey released - processing audio
📼 Captured 3.5s audio (56000 samples)
✨ Transcription: "Hello, this is a test"
✅ Inserted 22 chars
✓ Ready for next recording

3. Test

Open any text editor (TextEdit, VS Code, Notes, Chrome)
Click into a text field
Press and hold Ctrl+Option+Z
Speak clearly: "Hello, this is a test"
Release the hotkey
Text appears at cursor in ~2s

Expected output:

✓ Config loaded from ~/.whisper-hotkey/config.toml
✓ Telemetry initialized
✓ Permissions OK
✓ Model found at /Users/you/.whisper-hotkey/models/ggml-small.bin
Loading Whisper model (this may take a few seconds)...
  Optimization: 4 threads, beam_size=5
✓ Whisper model loaded and ready
✓ Audio capture initialized
✓ Hotkey registered: ["Control", "Option"] + Z

Whisper Hotkey is running. Press the hotkey to record and transcribe.
✓ Full pipeline ready: hotkey → audio → transcription → text insertion
Press Ctrl+C to exit.

Configuration

Edit ~/.whisper-hotkey/config.toml:

# Multi-profile support - define multiple hotkeys with different models
[[profiles]]
name = "Fast"                       # optional profile name
model_type = "small"                # tiny, base, small, medium, large, tiny.en, base.en, small.en, medium.en
[profiles.hotkey]
modifiers = ["Control", "Option"]
key = "Z"
preload = true                      # load on startup (recommended)
threads = 4                         # CPU threads (try 2/4/8)
beam_size = 1                       # 1=fast, 5=balanced, 10=accurate
language = "en"                     # optional language hint

[[profiles]]
name = "Accurate"
model_type = "medium"
[profiles.hotkey]
modifiers = ["Command", "Shift"]
key = "V"
threads = 4
beam_size = 5
language = "en"

[audio]
buffer_size = 1024                  # frames (leave default)
sample_rate = 16000                 # Hz (leave default)

[telemetry]
enabled = true                      # local crash logging only
log_path = "~/.whisper-hotkey/crash.log"

[recording]
enabled = true                      # save debug recordings
retention_days = 7                  # auto-delete after N days
max_count = 100                     # keep max N recordings
cleanup_interval_hours = 24         # cleanup frequency

[aliases]
enabled = true                      # fuzzy text replacement
threshold = 0.85                    # match threshold (0.0-1.0)
[aliases.entries]
"my email" = "user@example.com"
"my address" = "123 Main St, City, State 12345"
"github" = "https://github.com/username"

After editing: Restart app (Ctrl+C, then cargo run --release)

Note: You can define multiple profiles with different models and hotkeys. The tray icon menu shows all active profiles.

Auto-Start at Login

To run whisper-hotkey automatically when you log in:

# Setup LaunchAgent (starts now and at every login)
./scripts/setup-launchagent.sh

Manage the service:

# Stop service
launchctl unload ~/Library/LaunchAgents/com.whisper-hotkey.plist

# Start service
launchctl load ~/Library/LaunchAgents/com.whisper-hotkey.plist

# Restart service
launchctl kickstart -k gui/$(id -u)/com.whisper-hotkey

# Check status
launchctl list | grep whisper-hotkey

# View logs
tail -f ~/.whisper-hotkey/stdout.log
tail -f ~/.whisper-hotkey/stderr.log

To disable auto-start:

launchctl unload ~/Library/LaunchAgents/com.whisper-hotkey.plist
rm ~/Library/LaunchAgents/com.whisper-hotkey.plist

Performance Tuning

Fast Mode (sacrifice accuracy)

[model]
threads = 8
beam_size = 1

Accurate Mode (slower)

[model]
threads = 4
beam_size = 10

Different Models

[model]
model_type = "tiny"   # Faster, less accurate (~75MB)
# or
model_type = "base"   # Good balance (~142MB)
# or
model_type = "medium" # More accurate, slower (~1.5GB)

App auto-downloads model on next run.

Troubleshooting

Permissions not working after granting them (macOS Quarantine)

Symptoms: You've granted Microphone and Accessibility permissions, but the app still can't access them.

Cause: macOS quarantine attribute (applied to downloaded apps) prevents the system from recognizing granted permissions.

Solution:

Open Terminal

Run this command (replace path if needed):

xattr -d com.apple.quarantine /Applications/WhisperHotkey.app

Restart the app

Note: The app will detect quarantine on startup and show this command automatically.

"No input device available"

Grant Microphone permission: System Settings → Privacy & Security → Microphone
Reset: tccutil reset Microphone, then restart app

"Failed to register global hotkey"

Grant Accessibility permission: System Settings → Privacy & Security → Accessibility
Add Terminal/iTerm to allowed apps

Text not inserting

Check Accessibility permission (same as above)
Some apps block insertion (Terminal secure input mode)
Check logs: tail -f ~/.whisper-hotkey/crash.log

Slow transcription

Try faster config (threads=8, beam_size=1)
Use smaller model (tiny or base)
Check logs for inference_ms metric

Model download fails

Manual download from Hugging Face
Place in: ~/.whisper-hotkey/models/ggml-{name}.bin

Creating Releases

For maintainers:

# Auto-increment minor version (0.0.0 → 0.1.0)
./scripts/create-release.sh

# Specific version
./scripts/create-release.sh 0.1.0

# Or manually via GitHub CLI
gh workflow run release.yml              # Auto-increment
gh workflow run release.yml -f version=0.1.0  # Specific version

The release workflow:

Creates and pushes git tag (e.g., v0.1.0)
Builds release binary with optimizations
Creates .app bundle and DMG
Generates SHA256 checksum
Publishes GitHub release with artifacts

Monitor release:

gh run watch
# OR
gh run list --workflow=release.yml

Releases appear at: GitHub Releases

Development

Run tests

# Unit tests (no hardware required)
mise exec -- cargo test

# Hardware tests (requires mic + permissions)
mise exec -- cargo test -- --ignored

Logging levels

# Default: info level (shows hotkey events, transcription results)
cargo run --release

# Debug: detailed timing information
RUST_LOG=debug cargo run --release

# Trace: everything including low-level operations
RUST_LOG=whisper_hotkey=trace cargo run --release

# CPU profiling
sudo cargo flamegraph --release
# Trigger hotkey, Ctrl+C, then: open flamegraph.svg

# Memory profiling (macOS)
instruments -t Allocations target/release/whisper-hotkey

See TESTING.md for comprehensive profiling guide.

Format & lint

mise exec -- cargo fmt
mise exec -- cargo clippy

How It Works

Hotkey pressed → Clear audio buffer, start recording
Hotkey held → Accumulate audio samples (16kHz mono)
Hotkey released → Stop recording, convert audio format
Transcription → Whisper processes audio (~2s for 10s recording)
Text insertion → CGEvent inserts text at cursor

Tech stack:

Rust 1.84
Whisper.cpp (via whisper-rs bindings)
cpal (audio capture)
global-hotkey (hotkey detection)
Core Graphics CGEvent (text insertion)
tray-icon (menubar integration)

Advanced Features

Multi-Profile Support

Define multiple transcription profiles with different models and hotkeys:

Fast profile (Ctrl+Option+Z): Use small model for quick notes
Accurate profile (Cmd+Shift+V): Use medium model for emails

Each profile can have different:

Model type (tiny → large)
Hotkey combination
Inference settings (threads, beam_size)
Language hints

The tray icon menu displays all active profiles with their hotkeys and models.

Smart Aliases

Replace transcribed text with predefined values using fuzzy matching:

[aliases]
enabled = true
threshold = 0.85  # 0.0 (loose) to 1.0 (exact)
[aliases.entries]
"my email" = "john.doe@company.com"
"office address" = "123 Main St, Suite 400, San Francisco, CA 94105"
"github profile" = "https://github.com/username"

How it works:

Case-insensitive fuzzy matching (Jaro-Winkler algorithm)
Say "my email" → automatically replaced with your configured email
Handles pronunciation variations (e.g., "office address" vs "office adress")
Best match wins if multiple aliases are close

Use cases:

Email addresses / phone numbers
Physical addresses
URLs / code snippets
Company names / product names

Menubar Tray Icon

Visual feedback and quick access:

Adaptive icon (idle): Black on light mode, white on dark mode
Red icon (recording): Shows when hotkey is pressed
Yellow icon (processing): Shows during transcription
Menu: Lists all profiles, "Open Config File", "Quit"
Retina support: Automatically uses high-DPI icons

Debug Recording Retention

Optionally save audio recordings for debugging:

[recording]
enabled = true           # Save recordings to ~/.whisper-hotkey/debug/
retention_days = 7       # Auto-delete after N days
max_count = 100          # Keep max N most recent recordings
cleanup_interval_hours = 24  # Cleanup frequency

Recordings named: recording_{timestamp}.wav

Use cases:

Debug transcription accuracy issues
Compare different model performance
Report bugs with audio samples

Privacy

100% local: No cloud, no internet required (except model download)
No telemetry: Only local crash logs (~/.whisper-hotkey/crash.log)
No storage: Audio discarded after transcription

Limitations

macOS only (uses Core Graphics, Accessibility APIs)
No real-time streaming (Whisper design limitation)
No App Store (requires Accessibility, no sandbox)
Some apps resist text insertion (Terminal secure input, etc.)

Performance Targets

Metric	Target	Actual (M1, small model)
Audio start	<50ms	~5-10ms
Transcription (10s)	<2s	~1.5-2s
Text insertion	<100ms	~20-50ms
Idle CPU	<1%	~0.5%
Idle RAM	~1.5GB	~1.3GB

Roadmap

Phase 1: Foundation (config, telemetry, permissions)
Phase 2: Global hotkey
Phase 3: Audio recording
Phase 4: Whisper integration
Phase 5: Text insertion
Phase 6: Integration & polish
Phase 7: Optimization & testing
Phase 8: Distribution (.app bundle, installer)

See implem-plan.md for detailed implementation plan.

License

MIT

Contributing

PRs welcome! Please:

Run cargo fmt and cargo clippy before submitting
Add tests for new features
Update TESTING.md for profiling changes

Support

Issues: https://github.com/Automaat/whisper-hotkey/issues
Docs: See TESTING.md for profiling/debugging
Implementation: See implem-plan.md

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.claude		.claude
.github		.github
assets		assets
docs		docs
plans		plans
resources		resources
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.mise.toml		.mise.toml
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
Info.plist		Info.plist
README.md		README.md
TESTING.md		TESTING.md
clippy.toml		clippy.toml
codecov.yml		codecov.yml
coverage.lcov		coverage.lcov
icon.png		icon.png
implem-plan.md		implem-plan.md
phase-3-plan.md		phase-3-plan.md
renovate.json		renovate.json
streamdeck-integration-plan.md		streamdeck-integration-plan.md

Folders and files

Latest commit

History

Repository files navigation

whisper-hotkey

Quick Start

Option 1: Homebrew (Recommended)

Option 2: Download DMG

Option 3: Build from Source

3a: Automated Installer

3b: Manual Build & Run

3. Test

Configuration

Auto-Start at Login

Performance Tuning

Fast Mode (sacrifice accuracy)

Accurate Mode (slower)

Different Models

Troubleshooting

Permissions not working after granting them (macOS Quarantine)

"No input device available"

"Failed to register global hotkey"

Text not inserting

Slow transcription

Model download fails

Creating Releases

Development

Run tests

Logging levels

Format & lint

How It Works

Advanced Features

Multi-Profile Support

Smart Aliases

Menubar Tray Icon

Debug Recording Retention

Privacy

Limitations

Performance Targets

Roadmap

License

Contributing

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages