Vox

Open-source voice-to-text app with local Whisper transcription and AI-powered correction.

Hold a keyboard shortcut, speak, and Vox transcribes your voice locally using whisper.cpp, optionally corrects it with AI, and pastes the text into your active app.

Demo

Platform Support Vox runs on macOS (Apple Silicon and Intel) and Windows (10+). Linux support is planned for future releases.

Quick Start

Download the latest version from the releases page.

macOS: Drag Vox.app to your Applications folder.
Windows: Run the installer (.exe) and follow the setup wizard.

First Launch

When you first launch Vox, you'll need to:

Download a Whisper Model — Go to Settings > Local Model and download at least one speech recognition model. The "small" model (Recommended) is a good starting point.
Grant Permissions — Vox needs:
- Microphone: Required for voice recording
- Accessibility: Required for keyboard shortcuts and auto-paste
Configure Shortcuts (optional) — Customize keyboard shortcuts in Settings > Shortcuts
Enable AI Improvements (optional) — Configure LLM provider in Settings > AI Improvements

Vox will guide you through this setup process with visual indicators showing what's incomplete.

Once configured, hold Alt+Space to start recording.

Features

🔒 100% Local transcription — Powered by whisper.cpp, audio stays on your device
🤖 AI correction — Removes filler words and fixes grammar (optional)
⚙️ Custom prompts — Tailor corrections for medical, technical, creative, or any workflow
⌨️ Hold or toggle modes — Press-and-hold or toggle recording on/off
📋 Auto-paste — Text is pasted directly into your focused app
🎯 Multiple models — Choose speed vs accuracy (tiny to large)
☁️ Multiple LLM providers — OpenAI-compatible or AWS Bedrock
🎨 Menu bar app — Runs quietly in the background with dark/light mode support

Use Cases

👨‍⚕️ Medical Professionals

Preserve medical terminology and standard abbreviations. Vox understands context and won't autocorrect "OA" to "okay" or "PT" to "patient."

Example custom prompt:

"Preserve medical terminology, standard abbreviations (e.g., OA, PT, BP), and format as clinical notes."

👨‍💻 Developers & Engineers

Format technical dictation as concise documentation. Remove filler words while keeping technical terms intact.

Example custom prompt:

"Format as technical documentation. Be concise, remove filler words, preserve code terms and abbreviations."

✍️ Writers & Content Creators

Enhance prose while maintaining your unique voice. Turn spoken ideas into polished text ready for editing.

Example custom prompt:

"Enhance prose for readability while maintaining the author's voice. Fix grammar but keep the casual tone."

🌍 Language Learners

Practice speaking by translating and correcting your speech in real-time.

Example custom prompt:

"Translate to German and correct grammar. Output only the German translation."

📝 Note-Taking & Productivity

Capture thoughts quickly without typing. Perfect for meetings, brainstorming, or journaling.

How Vox Compares

Feature	Vox	Dragon NaturallySpeaking	macOS Dictation	Whisper Desktop Apps
Price	Free & Open Source	$300+	Free (limited)	Varies ($0-50)
Privacy	100% Local	Cloud-based	Cloud-based	Mostly local
Custom Prompts	✅ Full control	❌ Limited	❌ None	⚠️ Some apps
AI Enhancement	✅ Your own API	❌ None	⚠️ Basic	⚠️ Varies
Offline Mode	✅ Full	⚠️ Limited	❌ Requires internet	✅ Most
Native App	✅ Menu bar / tray	⚠️ Full app	✅ Built-in	✅ Varies
Custom Shortcuts	✅ Configurable	✅ Yes	⚠️ Limited	✅ Most
Open Source	✅ FSL-1.1-ALv2	❌ Proprietary	❌ Proprietary	⚠️ Some

Why Vox?

Privacy-first: Your audio never leaves your device
Flexibility: Use any OpenAI-compatible LLM or AWS Bedrock
Customization: Tailor AI corrections to your exact needs
Free & Open: No subscription, no cloud lock-in

Requirements

macOS (Apple Silicon or Intel) or Windows (10+)
LLM provider (optional) — for text correction:
- OpenAI-compatible endpoint with API key
- Or AWS Bedrock credentials with model access

Configuration

Whisper Models

Download at least one model from the Whisper tab:

Model	Size	Speed	Accuracy
tiny	~75 MB	Fastest	Lower
base	~140 MB	Fast	Decent
small	~460 MB	Good	Good
medium	~1.5 GB	Slow	Better
large	~3 GB	Slowest	Best

LLM Provider

Foundry (OpenAI-compatible)

Endpoint URL
API key
Model name (e.g., gpt-4o)

AWS Bedrock

AWS region
Credentials (access key, profile, or default chain)
Model ID (e.g., anthropic.claude-3-5-sonnet-20241022-v2:0)

Shortcuts

Customize keyboard shortcuts in the Shortcuts tab:

Hold mode (default: Alt+Space)
Toggle mode (default: Alt+Shift+Space)

Usage

Once configured, Vox runs as a menu bar icon.

Press your shortcut to record. The floating indicator shows:

Red — Recording
Yellow — Transcribing
Blue — Correcting (if LLM enabled)

Release (hold mode) or press again (toggle mode) to stop. Text is pasted automatically.

If correction fails, raw transcription is used. If transcription is empty (silence/noise), nothing is pasted.

Development

Setup

Requires cmake.

git clone https://github.com/app-vox/vox.git
cd vox
make install   # installs npm deps + builds whisper.cpp

Run

make dev        # development with hot reload
npm test        # run tests
npm run dist    # build production app

Built with Electron, React, TypeScript, and whisper.cpp.

Contributing

Contributions welcome! To contribute:

Fork and create a feature branch
Make your changes
Run npm run typecheck && npm run lint && npm test
Commit with Conventional Commits (e.g., feat(audio): add noise gate)
Open a pull request

⚠️ See more details in CONTRIBUTING.md.

FAQ

Is Vox really free?

Yes, Vox is 100% free and open-source. Transcription runs locally using Whisper.cpp. If you use optional AI enhancement, you'll need your own API keys (OpenAI-compatible or AWS Bedrock), but there are no fees from Vox.

Does my audio leave my device?

No. Transcription happens entirely on your device. Only if you enable AI enhancement does the text (not audio) get sent to your configured LLM provider for correction. Your audio recordings never leave your device.

What's the difference between local transcription and AI enhancement?

Local transcription: Whisper.cpp converts your speech to text on your device. Fast, accurate, 100% private.
AI enhancement (optional): Sends the transcribed text to an LLM to remove filler words ("um", "uh"), fix grammar, or apply custom corrections based on your prompt.

Which Whisper model should I use?

Small (~460MB): Best balance of speed and accuracy. Recommended for most users.
Tiny/Base: Faster but less accurate. Good for quick notes.
Medium/Large: Slower but more accurate. Good for technical/medical content or noisy environments.

You can switch models anytime in Settings.

Can I use Vox with Claude/ChatGPT/other LLMs?

Yes! Vox works with:

OpenAI-compatible APIs: OpenAI, Anthropic (via Bedrock), OpenRouter, local LLMs with OpenAI-compatible endpoints
AWS Bedrock: Claude, Llama, Mistral, and other Bedrock models

Does Vox work offline?

Yes. Local transcription works 100% offline. AI enhancement requires internet (since it calls your LLM provider API), but you can disable it and use raw transcription offline.

Why does Vox need Accessibility permissions?

Vox needs Accessibility access to:

Listen for your custom keyboard shortcuts globally
Simulate paste (Cmd+V on macOS, Ctrl+V on Windows) to insert transcribed text into your active app

Without this, Vox can't detect shortcuts or auto-paste text.

Can I contribute to Vox?

Absolutely! Vox is open-source. See CONTRIBUTING.md for guidelines. We welcome bug reports, feature requests, and pull requests.

What about Linux?

Vox runs on macOS and Windows. Linux support is planned — follow the repo for updates!

License

This project is licensed under the Functional Source License, Version 1.1, ALv2 Future License.

You can use, modify, and redistribute the code for any purpose except building a competing commercial product or service. After two years, each release automatically converts to the Apache License 2.0.

See LICENSE for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 434 Commits
.github		.github
build		build
docs		docs
resources		resources
scripts		scripts
src		src
tests		tests
.devskim.json		.devskim.json
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.kingfisher-baseline.yml		.kingfisher-baseline.yml
.mega-linter.yml		.mega-linter.yml
.nvmrc		.nvmrc
.secretlintrc.json		.secretlintrc.json
.stylelintrc.json		.stylelintrc.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
codecov.yml		codecov.yml
dev-app-update.yml		dev-app-update.yml
electron.vite.config.ts		electron.vite.config.ts
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
pipeline-test-config.json.example		pipeline-test-config.json.example
renovate.json		renovate.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.web.json		tsconfig.web.json
vitest.config.ts		vitest.config.ts
vitest.pipeline.config.ts		vitest.pipeline.config.ts

Folders and files

Latest commit

History

Repository files navigation

Vox

Demo

Table of Contents

Quick Start

First Launch

Features

Use Cases

👨‍⚕️ Medical Professionals

👨‍💻 Developers & Engineers

✍️ Writers & Content Creators

🌍 Language Learners

📝 Note-Taking & Productivity

How Vox Compares

Requirements

Configuration

Whisper Models

LLM Provider

Shortcuts

Usage

Development

Setup

Run

Contributing

FAQ

Is Vox really free?

Does my audio leave my device?

What's the difference between local transcription and AI enhancement?

Which Whisper model should I use?

Can I use Vox with Claude/ChatGPT/other LLMs?

Does Vox work offline?

Why does Vox need Accessibility permissions?

Can I contribute to Vox?

What about Linux?

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 30

Uh oh!

Contributors

Uh oh!

Languages