VTS - Voice Typing Studio

The open-source macOS dictation replacement you've been waiting for! 🚀

🔊 Turn on your sound! This demo includes audio to showcase the real-time transcription experience.

mini_demo_01.mov

Transform your voice into text instantly with the power of OpenAI, Groq, and Deepgram APIs. Say goodbye to macOS dictation limitations and hello to lightning-fast, accurate transcription with your own custom hotkeys! ⚡️

📋 Table of Contents

Demo
Why Choose VTS?
Screenshots
Getting Started
- Installation
- API Key Setup
Usage Guide
- Basic Transcription
- Advanced Features
Privacy & Security
Troubleshooting
Development
Roadmap
Feedback
License
Acknowledgements

✨ Why Choose VTS?

🤖 AI-Powered Accuracy: Leverage OpenAI, Groq, and Deepgram models for superior transcription
🔑 Your Keys, Your Control: Bring your own API keys - no subscriptions, no limits
🔄 Drop-in Replacement: Works exactly like macOS dictation, but better!
⌨️ Your Shortcut, Your Rules: Fully customizable global hotkeys (default: ⌘⇧;)
🎯 Smart Device Management: Intelligent microphone priority with seamless fallback
💬 Context-Aware: Custom system prompt boosts accuracy for your specific needs
🔓 100% Open Source: Full transparency, community-driven, modify as you wish

📷 Screenshots

🎬 Longer Demo

Monosnap.screencast.2025-07-23.02-48-42.mp4

Onboarding:

https://youtu.be/NTQmVCvkZQQ

🚀 Getting Started

📦 Installation

Ready to use VTS? Head over to our Releases Page to download the latest version.

Download

Universal Binary (Intel + Apple Silicon): Download the DMG from the releases page

Installation Steps

Download the DMG file from the releases page
Open the DMG
Drag VTS to Applications folder
Launch from Applications

Verification

All releases are code-signed and notarized by Apple for security.

Requirements

macOS 14.0+ (Apple Silicon & Intel supported)
API key from OpenAI, Groq, or Deepgram (see setup below)

API Key Setup

After installing VTS, you'll need an API key from one of these providers:

Only one API key is required - choose the provider you prefer!

📖 Usage Guide

Basic Transcription

Choose Provider: Select OpenAI, Groq, or Deepgram from the dropdown
Select Model: Pick whisper-1, whisper-large-v3, or other available models
Enter API Key: Paste your API key in the secure field
Start Recording: Press the global hotkey (default: ⌘⇧;) and speak
View Results: See real-time transcription inserted into the application you're using
(Optional) Copy: Use buttons to copy the transcript

Advanced Features

Microphone Priority Management

View Available Devices: See all connected microphones with system default indicators
Set Priority Order: Add devices to priority list with + buttons
Automatic Fallback: App automatically uses highest-priority available device
Real-time Switching: Seamlessly switches when preferred devices connect/disconnect
Remove from Priority: Use − buttons to remove devices from priority list

Custom System Prompts

Add context-specific prompts to improve transcription accuracy
Examples: "Medical terminology", "Technical jargon", "Names: John, Sarah, Mike"
Prompts help the AI better understand domain-specific language

🔒 Privacy & Security

No audio storage: Audio is processed in real-time, never stored locally
API keys are safe: Keys are stored in Keychain
TLS encryption: All API communication uses HTTPS
Microphone permission: Explicit user consent required for audio access
Basic telemetry: We collect minimal usage analytics in compliance with GDPR regulations

🛠️ Troubleshooting

Common Issues

Microphone Permission Denied: Check System Settings > Privacy & Security > Microphone
No Microphones Found: Click "Refresh" in the Microphone Priority section
Wrong Microphone Active: Set your preferred priority order or check device connections
App Not Responding to Hotkey: Ensure accessibility permissions are granted when prompted

👩‍💻 Development

This section is for developers who want to build VTS from source or contribute to the project.

Development Requirements

macOS 14.0+ (Apple Silicon & Intel supported)
Xcode 15+ for building
API key from OpenAI, Groq, or Deepgram for testing

Building from Source

Clone the repository:

git clone https://github.com/j05u3/VTS.git
cd VTS

Open in Xcode:

open VTSApp.xcodeproj

Build and run:
- In Xcode, select the VTSApp scheme
- Build and run with ⌘R
- Grant microphone permission when prompted

Command Line Building

# Build via command line
xcodebuild -project VTSApp.xcodeproj -scheme VTSApp build

Architecture

VTS follows a clean, modular architecture:

CaptureEngine: Handles audio capture using AVAudioEngine with Core Audio device management
DeviceManager: Manages microphone priority lists and automatic device selection
TranscriptionService: Orchestrates streaming transcription with provider abstraction
STTProvider Protocol: Clean interface allowing easy addition of new providers
Modern SwiftUI: Reactive UI with proper state management and real-time updates

Testing

Currently, VTS includes manual testing capabilities through the built-in Text Injection Test Suite accessible from the app's interface. This allows you to test text insertion functionality across different applications.

Automated unit tests are planned for future releases.

Development Troubleshooting

Accessibility Permissions (Development)

Permission Not Updating: During development/testing, when the app changes (rebuild, code changes), macOS treats it as a "new" app
Solution: Remove the old app entry from System Settings > Privacy & Security > Accessibility, then re-grant permission
Why This Happens: Each build gets a different signature, so macOS sees it as a different application
Quick Fix: Check the app list in Accessibility settings and remove any old/duplicate VTS entries

Testing Onboarding Flow

Reset App State: To test the complete onboarding flow, change the PRODUCT_BUNDLE_IDENTIFIER in Xcode project settings
Why This Works: Changing the bundle identifier creates a "new" app from macOS perspective, resetting all permissions and app state
Most Reliable Method: This is more reliable than clearing UserDefaults and ensures a clean onboarding test including all system permissions

Contributing

See CONTRIBUTING.md for details on how to contribute to VTS development.

🗺️ Roadmap

Auto-open at login: Auto-open at login with checkbox in the preferences window (✅ Implemented)
Modern Release Automation: Automated releases with release-please and GitHub Actions (✅ Implemented)
Sparkle Auto-Updates: Automatic app updates with GitHub Releases appcast hosting (✅ Implemented)
Support real-time API: OpenAI Real-time Transcription API (✅ Implemented)

In a future or maybe pro version, to be decided/ordered by priority, your feedback and contributions are welcome!

More models/providers: Support for more STT providers like Google, Azure, etc.
Safe auto-cut: Auto-cut to maximum time if the user forgets to end (or accidentally starts). But also we could use VAD from real-time APIs?
LLM step: Use LLM to process the transcription and improve accuracy, maybe targetted to the app you're using or context in general. (Be able to easily input emojis?). I mean apply transformations based on the app you're injecting text to.
Advanced Audio Processing: Noise reduction and gain control, but also some STT providers can do this so maybe not needed?
Comprehensive Test Suite: Automated unit tests covering:
- Core transcription functionality
- Provider validation and error handling
- Device management and priority logic
- Integration flows and edge cases
Accessibility Features

💬 Feedback

Have feedback, suggestions, or issues? We'd love to hear from you!

📧 Send us your feedback - Quick and direct way to reach us

You can also:

🐛 Report bugs or request features on GitHub
💡 Share your ideas for improvements
⭐ Star the project if you find it useful!

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgements

VTS wouldn't be possible without the incredible work of the open-source community. Special thanks to:

Tools & Scripts

ios-icon-generator by @smallmuou - for the awesome icon generation script that made creating our app icons effortless
create-dmg by @sindresorhus - for the excellent DMG creation script that streamlines our distribution process
Sparkle by the Sparkle Project - for providing the robust auto-update framework that keeps VTS current and secure

Note: This project builds upon the work of many developers and projects. If I've missed crediting someone or something I sincerely apologize! Please feel free to open an issue or PR to help me give proper recognition where it's due.

Made with ❤️ for the macOS community

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.claude/agents		.claude/agents
.github		.github
VTSApp.xcodeproj		VTSApp.xcodeproj
VTSApp		VTSApp
public		public
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DISTRIBUTION_SETUP.md		DISTRIBUTION_SETUP.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
version.txt		version.txt

Uh oh!

License

j05u3/VTS

Folders and files

Latest commit

History

Repository files navigation

VTS - Voice Typing Studio

📋 Table of Contents

✨ Why Choose VTS?

📷 Screenshots

🎬 Longer Demo

🚀 Getting Started

📦 Installation

Download

Installation Steps

Verification

Requirements

API Key Setup

📖 Usage Guide

Basic Transcription

Advanced Features

Microphone Priority Management

Custom System Prompts

🔒 Privacy & Security

🛠️ Troubleshooting

Common Issues

👩‍💻 Development

Development Requirements

Building from Source

Command Line Building

Architecture

Testing

Development Troubleshooting

Accessibility Permissions (Development)

Testing Onboarding Flow

Contributing

🗺️ Roadmap

In a future or maybe pro version, to be decided/ordered by priority, your feedback and contributions are welcome!

💬 Feedback

📄 License

🙏 Acknowledgements

Tools & Scripts

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 11

Sponsor this project

Uh oh!

Contributors 6

Uh oh!

Languages