A cross-platform voice-to-clipboard prompt generator. Press a hotkey, speak your prompt, and it's transcribed, cleaned up, and copied to your clipboard — ready to paste into any AI coding assistant.
Voice Prompt combines local Whisper speech-to-text with optional Gemini refinement to turn spoken dictation into clean, structured prompts. It runs as a background service with a minimal overlay bar and a GTK4/Adwaita settings dashboard.
Hotkey → Record Audio → Transcribe (Whisper) → Refine (Gemini) → Clipboard
- Press your hotkey (default:
Ctrl+Space) from anywhere on your desktop - A recording overlay appears — speak your prompt
- Press the hotkey again to stop recording
- Audio is transcribed locally using Whisper (nothing leaves your machine)
- The transcript is optionally refined by Gemini to remove filler words and fix speech artifacts
- The final prompt is copied to your clipboard automatically
- Cross-platform — native support for Linux (Wayland/X11) and macOS
- Global hotkey — works from any application (evdev on Linux, rdev on macOS)
- Local transcription — Whisper runs entirely on your machine; no audio is uploaded
- Smart refinement — Gemini removes filler words (um, uh, like), fixes speech errors, and preserves technical terms
- Graceful fallback — works without an API key (raw transcription only)
- Recording overlay — minimal floating bar with live waveform, timer, and status phases
- Settings dashboard — GTK4/Adwaita UI to configure hotkey, API key, and view history
- Prompt history — browse and re-copy past prompts
- Usage statistics — tracks total prompts and word count
- Audio feedback — ascending beep on record start, descending beep on stop
- Auto-start — systemd service (Linux) or launchd agent (macOS)
- Clipboard support —
wl-copy(Wayland),xclip(X11),pbcopy(macOS)
The installer auto-detects your OS and package manager, builds the application, and installs it to your system:
git clone https://github.com/tr4m0ryp/voice-prompt.git
cd voice-prompt
bash install.shThis handles everything — system dependencies, compilation, binary installation, desktop entry, and application icon.
| Platform | Package manager | What it does |
|---|---|---|
| macOS | Homebrew | Installs GTK4/libadwaita, builds, copies binary |
| Fedora | dnf | Installs dev packages, adds input group, builds, installs desktop entry |
| Ubuntu/Debian | apt | Installs dev packages, adds input group, builds, installs desktop entry |
| Arch | pacman | Installs dev packages, adds input group, builds, installs desktop entry |
After installation, Voice Prompt appears in your application menu. You can also run it from the terminal:
voice-promptOn first launch, the Whisper model (~140 MB) downloads automatically. The settings dashboard opens where you can configure your hotkey and optionally add a Gemini API key.
To remove Voice Prompt from your system:
bash uninstall.shYou'll be prompted whether to remove configuration and data files.
Linux (systemd)
cp target/release/voice-prompt ~/.cargo/bin/
mkdir -p ~/.config/systemd/user
cp voice-prompt.service ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now voice-promptmacOS (launchd)
cp target/release/voice-prompt /usr/local/bin/
mkdir -p ~/Library/LaunchAgents
cp com.voice-prompt.agent.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.voice-prompt.agent.plist- Global hotkeys use evdev, which requires your user to be in the
inputgroup. The installer handles this, but you must log out and back in afterward. - The recording overlay uses gtk4-layer-shell on Wayland for proper always-on-top positioning. On X11 it falls back to standard GTK windowing.
- Global hotkeys use rdev. macOS requires you to grant Input Monitoring permission:
System Settings > Privacy & Security > Input Monitoring — add
voice-prompt(or your terminal) to the list. - The overlay renders as an undecorated floating window using native GTK4 window management.
If you prefer to install dependencies yourself instead of using install.sh:
Fedora
sudo dnf install -y \
gtk4-devel libadwaita-devel gtk4-layer-shell-devel \
wayland-devel wayland-protocols-devel gobject-introspection-devel \
alsa-lib-devel pkg-config cmake gcc-c++ \
wl-clipboard xclip libnotify clang-devel
sudo usermod -aG input $USER # log out & back inUbuntu / Debian
sudo apt install -y \
libgtk-4-dev libadwaita-1-dev libgtk4-layer-shell-dev \
libasound2-dev pkg-config cmake g++ libclang-dev \
wl-clipboard xclip libnotify-bin
sudo usermod -aG input $USER # log out & back inArch
sudo pacman -S --needed gtk4 libadwaita gtk4-layer-shell \
alsa-lib pkg-config cmake clang \
wl-clipboard xclip libnotify
sudo usermod -aG input $USER # log out & back inmacOS (Homebrew)
brew install gtk4 libadwaita pkg-config cmakeThen build:
cargo build --releaseAll configuration is managed through the settings dashboard and stored in ~/.config/voice-prompt/config.json.
To enable smart transcript refinement:
- Get a free API key from Google AI Studio
- Enter it in the settings dashboard under "Gemini API Key"
Without an API key, Voice Prompt still works — you get raw Whisper transcriptions without cleanup.
Default: Ctrl+Space. Click Change Hotkey in the dashboard and press your desired combination to rebind it.
┌─────────────────────────────────────────────────────┐
│ GTK Main Thread │
│ ┌───────────┐ ┌───────────┐ ┌────────────────┐ │
│ │ Dashboard │ │ Overlay │ │ Event Handler │ │
│ │ (Settings) │ │ (Record) │ │ (Dispatch) │ │
│ └───────────┘ └───────────┘ └───────┬────────┘ │
│ │ │
│ async_channel│ │
├────────────────────────────────────────┼─────────────┤
│ Background Threads │ │
│ ┌────────────┐ ┌──────────┐ ┌───────┴──────────┐ │
│ │evdev / rdev│ │ CPAL │ │ Tokio │ │
│ │ (Hotkey) │ │ (Record) │ │ (Whisper + API) │ │
│ └────────────┘ └──────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────┘
- GTK main thread — all UI state in
Rc<RefCell<AppState>> - Hotkey thread — evdev (Linux) or rdev (macOS) for global key monitoring
- CPAL — captures microphone audio at 16kHz mono
- Tokio runtime — async Gemini API calls + CPU-heavy Whisper transcription via
spawn_blocking - async_channel — bridges background threads to the GTK main loop
| Data | Location |
|---|---|
| Configuration | ~/.config/voice-prompt/config.json |
| Usage statistics | ~/.local/share/voice-prompt/stats.json |
| Whisper model | ~/.local/share/voice-prompt/models/ggml-base.en.bin |
- Audio never leaves your machine — Whisper transcription runs entirely locally
- Transcript text is sent to Google Gemini only if you provide an API key, and only for refinement
- No telemetry — no data is collected or sent anywhere
- All personal data (prompts, statistics) is stored locally in your home directory