Open-source voice-to-text app with local Whisper transcription and AI-powered correction.
Hold a keyboard shortcut, speak, and Vox transcribes your voice locally using whisper.cpp, optionally corrects it with AI, and pastes the text into your active app.
Platform Support Vox runs on macOS (Apple Silicon and Intel) and Windows (10+). Linux support is planned for future releases.
- Quick Start
- Features
- Use Cases
- How Vox Compares
- Requirements
- Configuration
- Usage
- FAQ
- Development
- Contributing
- License
Download the latest version from the releases page.
- macOS: Drag
Vox.appto your Applications folder. - Windows: Run the installer (
.exe) and follow the setup wizard.
When you first launch Vox, you'll need to:
-
Download a Whisper Model β Go to Settings > Local Model and download at least one speech recognition model. The "small" model (Recommended) is a good starting point.
-
Grant Permissions β Vox needs:
- Microphone: Required for voice recording
- Accessibility: Required for keyboard shortcuts and auto-paste
-
Configure Shortcuts (optional) β Customize keyboard shortcuts in Settings > Shortcuts
-
Enable AI Improvements (optional) β Configure LLM provider in Settings > AI Improvements
Vox will guide you through this setup process with visual indicators showing what's incomplete.
Once configured, hold Alt+Space to start recording.
- π 100% Local transcription β Powered by whisper.cpp, audio stays on your device
- π€ AI correction β Removes filler words and fixes grammar (optional)
- βοΈ Custom prompts β Tailor corrections for medical, technical, creative, or any workflow
- β¨οΈ Hold or toggle modes β Press-and-hold or toggle recording on/off
- π Auto-paste β Text is pasted directly into your focused app
- π― Multiple models β Choose speed vs accuracy (tiny to large)
- βοΈ Multiple LLM providers β OpenAI-compatible or AWS Bedrock
- π¨ Menu bar app β Runs quietly in the background with dark/light mode support
Preserve medical terminology and standard abbreviations. Vox understands context and won't autocorrect "OA" to "okay" or "PT" to "patient."
Example custom prompt:
"Preserve medical terminology, standard abbreviations (e.g., OA, PT, BP), and format as clinical notes."
Format technical dictation as concise documentation. Remove filler words while keeping technical terms intact.
Example custom prompt:
"Format as technical documentation. Be concise, remove filler words, preserve code terms and abbreviations."
Enhance prose while maintaining your unique voice. Turn spoken ideas into polished text ready for editing.
Example custom prompt:
"Enhance prose for readability while maintaining the author's voice. Fix grammar but keep the casual tone."
Practice speaking by translating and correcting your speech in real-time.
Example custom prompt:
"Translate to German and correct grammar. Output only the German translation."
Capture thoughts quickly without typing. Perfect for meetings, brainstorming, or journaling.
| Feature | Vox | Dragon NaturallySpeaking | macOS Dictation | Whisper Desktop Apps |
|---|---|---|---|---|
| Price | Free & Open Source | $300+ | Free (limited) | Varies ($0-50) |
| Privacy | 100% Local | Cloud-based | Cloud-based | Mostly local |
| Custom Prompts | β Full control | β Limited | β None | |
| AI Enhancement | β Your own API | β None | ||
| Offline Mode | β Full | β Requires internet | β Most | |
| Native App | β Menu bar / tray | β Built-in | β Varies | |
| Custom Shortcuts | β Configurable | β Yes | β Most | |
| Open Source | β FSL-1.1-ALv2 | β Proprietary | β Proprietary |
Why Vox?
- Privacy-first: Your audio never leaves your device
- Flexibility: Use any OpenAI-compatible LLM or AWS Bedrock
- Customization: Tailor AI corrections to your exact needs
- Free & Open: No subscription, no cloud lock-in
- macOS (Apple Silicon or Intel) or Windows (10+)
- LLM provider (optional) β for text correction:
- OpenAI-compatible endpoint with API key
- Or AWS Bedrock credentials with model access
Download at least one model from the Whisper tab:
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| tiny | ~75 MB | Fastest | Lower |
| base | ~140 MB | Fast | Decent |
| small | ~460 MB | Good | Good |
| medium | ~1.5 GB | Slow | Better |
| large | ~3 GB | Slowest | Best |
Foundry (OpenAI-compatible)
- Endpoint URL
- API key
- Model name (e.g.,
gpt-4o)
AWS Bedrock
- AWS region
- Credentials (access key, profile, or default chain)
- Model ID (e.g.,
anthropic.claude-3-5-sonnet-20241022-v2:0)
Customize keyboard shortcuts in the Shortcuts tab:
- Hold mode (default:
Alt+Space) - Toggle mode (default:
Alt+Shift+Space)
Once configured, Vox runs as a menu bar icon.
Press your shortcut to record. The floating indicator shows:
- Red β Recording
- Yellow β Transcribing
- Blue β Correcting (if LLM enabled)
Release (hold mode) or press again (toggle mode) to stop. Text is pasted automatically.
If correction fails, raw transcription is used. If transcription is empty (silence/noise), nothing is pasted.
Requires cmake.
git clone https://github.com/app-vox/vox.git
cd vox
make install # installs npm deps + builds whisper.cppmake dev # development with hot reload
npm test # run tests
npm run dist # build production appBuilt with Electron, React, TypeScript, and whisper.cpp.
Contributions welcome! To contribute:
- Fork and create a feature branch
- Make your changes
- Run
npm run typecheck && npm run lint && npm test - Commit with Conventional Commits (e.g.,
feat(audio): add noise gate) - Open a pull request
Yes, Vox is 100% free and open-source. Transcription runs locally using Whisper.cpp. If you use optional AI enhancement, you'll need your own API keys (OpenAI-compatible or AWS Bedrock), but there are no fees from Vox.
No. Transcription happens entirely on your device. Only if you enable AI enhancement does the text (not audio) get sent to your configured LLM provider for correction. Your audio recordings never leave your device.
- Local transcription: Whisper.cpp converts your speech to text on your device. Fast, accurate, 100% private.
- AI enhancement (optional): Sends the transcribed text to an LLM to remove filler words ("um", "uh"), fix grammar, or apply custom corrections based on your prompt.
- Small (~460MB): Best balance of speed and accuracy. Recommended for most users.
- Tiny/Base: Faster but less accurate. Good for quick notes.
- Medium/Large: Slower but more accurate. Good for technical/medical content or noisy environments.
You can switch models anytime in Settings.
Yes! Vox works with:
- OpenAI-compatible APIs: OpenAI, Anthropic (via Bedrock), OpenRouter, local LLMs with OpenAI-compatible endpoints
- AWS Bedrock: Claude, Llama, Mistral, and other Bedrock models
Yes. Local transcription works 100% offline. AI enhancement requires internet (since it calls your LLM provider API), but you can disable it and use raw transcription offline.
Vox needs Accessibility access to:
- Listen for your custom keyboard shortcuts globally
- Simulate paste (
Cmd+Von macOS,Ctrl+Von Windows) to insert transcribed text into your active app
Without this, Vox can't detect shortcuts or auto-paste text.
Absolutely! Vox is open-source. See CONTRIBUTING.md for guidelines. We welcome bug reports, feature requests, and pull requests.
Vox runs on macOS and Windows. Linux support is planned β follow the repo for updates!
This project is licensed under the Functional Source License, Version 1.1, ALv2 Future License.
You can use, modify, and redistribute the code for any purpose except building a competing commercial product or service. After two years, each release automatically converts to the Apache License 2.0.
See LICENSE for full details.
