Whisper Dictation Features
100% Local Processing - No cloud, no telemetry, works offline
Push-to-Talk Interface - Hold Super+Period, speak, release to paste
Real-time Feedback - GTK4 desktop notifications show status
Universal Compatibility - Works in any Wayland application
99+ Languages - Full Whisper language support
Auto-Detection - Automatically detects language (Italian, English, Spanish, French, etc.)
Manual Selection - Force specific language for fastest processing
Mixed Language - Auto-detect handles seamless language switching
Model
Size
Speed
Accuracy
Use Case
tiny
39 MB
~1-2s ⚡
60%
Quick notes, testing
base
142 MB
~4-6s ⚡⚡
70%
Recommended for speed
small
466 MB
~10-15s
80%
Balanced performance
medium
1.5 GB
~20-30s
85%
High accuracy
large
2.9 GB
~40-60s
90%
Maximum accuracy
Code Formatting - Black + Ruff auto-formatting
Complexity Checks - CCN < 10 enforced
Security Scanning - Gitleaks + Semgrep
Test Coverage - >75% requirement
Pre-commit Hooks - Automatic quality enforcement
DevEnv Integration - Pure, reproducible development shell
Cursor AI Rules - Pre-configured code intelligence
Convenience Commands - run-daemon-auto, run-daemon-en, etc.
Debug Mode - Comprehensive logging with run-daemon-debug
No network requests during transcription
Audio files deleted after processing
No keyboard event logging (only hotkey detection)
Local model storage (no external dependencies)
Virtual device filtering (excludes ydotoold, xdotool)
Input group permission requirements
Subprocess hardening
Secret scanning in git commits
Customize key combination (default: Super+Period)
Multiple modifier key support (Super, Ctrl, Alt, Shift)
Per-device keyboard selection
Remove filler words ("um", "uh", etc.)
Auto-capitalization
Auto-punctuation (experimental)
Dark mode (default)
Light mode
Auto (follows system theme)
Wayland-native (ydotool)
GNOME desktop notifications
PulseAudio/PipeWire audio capture
evdev keyboard monitoring
NixOS flakes support
DevEnv/direnv compatibility
Git hooks with Commitizen
Cursor AI workspace rules
run-daemon-auto # Auto-detect language
run-daemon-en # English only
run-daemon-it # Italian only
run-daemon-debug # Debug mode
test # Run test suite
format # Format code
quality-check # Run all gates
setup-dev # First-time setup
dictate-en # Switch to English
dictate-it # Switch to Italian
vim ~ /.config/whisper-dictation/config.yaml
Comparison to Alternatives
Feature
Whisper Dictation
Aqua Voice
Talon Voice
Privacy
✅ 100% Local
❌ Cloud
✅ Local
Cost
✅ Free
💲 $8/mo
💲 $15/mo
NixOS Support
✅ Native
❌ No
⚠️ Manual
Speed (base)
⚠️ 4-6s
✅ 850ms
✅ 1-2s
Technical Terms
⚠️ 70%
✅ 97%
✅ 95%
Wayland
✅ Yes
⚠️ Limited
❌ X11 only
Multilingual
✅ 99 languages
✅ Yes
⚠️ Limited
Offline
✅ Yes
❌ No
✅ Yes
Open Source
✅ MIT
❌ No
❌ No
Speed slower than commercial solutions (4-6s vs 850ms for Aqua Voice)
Technical jargon accuracy varies by model (70-90%)
CPU-only processing (GPU support planned)
Single-user design (no multi-user profiles yet)
No streaming transcription (processes after key release)
NixOS or Linux with Nix
Wayland compositor (X11 may work with modifications)
Minimum 4GB RAM (8GB recommended for medium model)
User must be in input group
PulseAudio or PipeWire
Use base model for faster processing (trade-off: lower accuracy)
Commercial alternatives (Aqua Voice, Avalon API) for maximum speed
GPU acceleration can reduce times by 5-10x (implementation pending)
For detailed setup instructions, see README.md
For development guide, see DEVELOPMENT.md