Privacy-first local speech-to-text for NixOS -- whisper.cpp powered, push-to-talk, paste anywhere.
- 100% local and private -- no cloud, no telemetry, works fully offline
- Push-to-talk -- hold Super+Period, speak, release to paste text
- Real-time feedback -- floating GTK4 window shows transcription status
- Multilingual -- supports 99 languages with auto-detection
- Wayland native -- built for GNOME on Wayland, works in any application
- Optimized for technical speech -- tuned for developer and AI workflows
- NixOS or any Linux distribution with Nix
- Wayland compositor (GNOME recommended)
- PulseAudio or PipeWire
- User must be in the
inputgroup for keyboard monitoring
Add to your flake.nix:
{
inputs.whisper-dictation.url = "github:jacopone/whisper-dictation";
# In your configuration
environment.systemPackages = [
inputs.whisper-dictation.packages.${system}.default
];
# Enable auto-start
systemd.user.services.whisper-dictation = {
enable = true;
wantedBy = [ "graphical-session.target" ];
};
}git clone https://github.com/jacopone/whisper-dictation.git
cd whisper-dictation
nix develop
python -m whisper_dictation.daemonFirst-time setup: ensure your user is in the input group (sudo usermod -aG input $USER, then log out and back in), download a Whisper model to ~/.local/share/whisper-models/, and start the ydotoold daemon. See the first-time setup section in DEVELOPMENT.md for details.
Start the daemon and dictate:
run-daemon # use config file settings
run-daemon-en # English only (fastest)
run-daemon-it # Italian only
run-daemon-auto # auto-detect language (adds ~1-2s)Then in any application:
- Click in a text field
- Hold Super+Period
- Speak naturally
- Release the key -- text is pasted instantly
Override settings per-session with command-line flags:
python -m whisper_dictation.daemon --verbose --language auto --model baseEdit ~/.config/whisper-dictation/config.yaml. Key settings:
whisper.model-- model size:tiny,base(recommended),small,medium,largewhisper.language-- language code (en,it,auto, etc.)hotkey.key/hotkey.modifiers-- push-to-talk keybinding
See config.yaml in the repository for all available options.
Model selection guide
| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| tiny | 39 MB | ~1-2s | 60% | Quick notes, testing |
| base | 142 MB | ~4-6s | 70% | Recommended for speed |
| small | 466 MB | ~10-15s | 80% | Balanced performance |
| medium | 1.5 GB | ~20-30s | 85% | High accuracy |
| large | 2.9 GB | ~40-60s | 90% | Maximum accuracy |
Times measured on CPU (4 threads). GPU acceleration can reduce times by 5-10x.
- Keyboard monitoring --
evdevcaptures low-level key events - Audio recording --
ffmpegrecords microphone input while the key is held - Transcription --
whisper.cppprocesses audio locally on your machine - Text insertion --
ydotoolpastes transcribed text into the active window - UI feedback -- GTK4 floating window shows real-time status
| Feature | Whisper Dictation | Aqua Voice | Talon Voice |
|---|---|---|---|
| Privacy | Local | Cloud | Local |
| Cost | Free | $8/mo | $15/mo |
| NixOS support | Native | No | Manual |
| Technical terms | 65-85% | 97% | 95% |
| Wayland | Yes | Limited | X11 only |
| Real-time | Yes | Yes | Yes |
See DEVELOPMENT.md for the full development guide.
See TROUBLESHOOTING.md for solutions to common issues (audio, keyboard detection, ydotool, hotkeys, performance).
Contributions welcome. See CONTRIBUTING.md for guidelines.
MIT License -- see LICENSE.