Real-time speech-to-text transcription at your fingertips
- Press a hotkey to start voice recording
- Release to instantly get text transcription
- Supports multiple transcription providers:
- OpenAI Whisper API
- Google Gemini
- Local Apple Silicon MLX (Mac only)
- Self-hosted Whisper servers
UtterType requires PortAudio to capture audio from your microphone.
macOS - Click to expand
Installing PortAudio on macOS is easiest with Homebrew:
brew install portaudio
Then install PyAudio:
pip install pyaudio
Windows - Click to expand
On Windows, PyAudio typically installs without additional dependencies:
python -m pip install pyaudio
Linux - Click to expand
On Linux, install the system package:
sudo apt-get install python3-pyaudio
UtterType uses a keyboard hotkey that you hold down to record audio and release to transcribe.
- macOS: Uses the globe key (🌐) by default (bottom left of keyboard)
- Windows/Linux: Configure in your
.env
file:
UTTERTYPE_RECORD_HOTKEYS="<ctrl>+<alt>+v"
UtterType uses the pynput library for hotkey functionality. See their documentation for more key combinations.
Option A: Using uv (Recommended)
-
Install uv if you haven't already:
- Follow the uv installation documentation
-
Create a virtual environment and install dependencies:
uv sync
This will:
- Create a virtual environment in
.venv
- Install all dependencies from pyproject.toml
- Install uttertype in development mode
- Create a virtual environment in
-
Activate the virtual environment:
# On Linux/macOS source .venv/bin/activate # On Windows .venv\Scripts\activate
Option B: Using pip
Install in development mode with pip:
pip install -e .
Troubleshooting: Linux GLIBCXX errors
If you see an error like this on Linux:
ImportError: /home/soul/anaconda3/lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found
This is typically caused by Conda environments. See solutions on:
UtterType supports multiple speech recognition providers:
Provider | Description | API Key Required | Platforms |
---|---|---|---|
OpenAI Whisper | Cloud-based transcription | Yes | All |
Google Gemini | Cloud-based transcription | Yes | All |
Apple MLX | Local transcription on Mac | No (HF token needed) | macOS (M1/M2/M3) |
Local Whisper Server | Self-hosted server | No | All |
Step 1: Create .env file (Recommended)
-
Copy the sample environment file:
cp .sample_env .env
-
Edit the
.env
file to uncomment and configure your chosen provider's section.
See .sample_env
for the complete configuration reference.
Step 2: Choose your provider
Set the UTTERTYPE_PROVIDER
in your .env
file to one of:
# Options: "openai" (default), "mlx" (Mac only), or "google"
UTTERTYPE_PROVIDER="openai"
Alternative: Using Environment Variables
Instead of a .env
file, you can set variables directly in your terminal:
OpenAI Whisper (Linux/macOS):
export UTTERTYPE_PROVIDER="openai"
export OPENAI_API_KEY="sk-your-key-here"
Apple Silicon MLX (Mac only):
export UTTERTYPE_PROVIDER="mlx"
export HF_TOKEN="your-huggingface-token"
# Also run: uv sync --extra mlx
Google Gemini (Linux/macOS):
export UTTERTYPE_PROVIDER="google"
export GEMINI_API_KEY="your-api-key-here"
For Windows, use $env:
instead of export
.
Minimum Recording Duration
You can set a minimum duration threshold for recordings to prevent accidental transcriptions from quick hotkey presses:
# Set minimum recording duration to 300ms (default)
UTTERTYPE_MIN_RECORDING_MS=300
This is useful if you want to use the hotkey button for other purposes when pressed quickly (i.e., not held down). Any recording shorter than this duration will be ignored.
Google Vertex AI
For enterprise Google Vertex AI integration:
- Install the Google Cloud CLI
- Authenticate:
gcloud auth application-default login
- Configure in
.env
:UTTERTYPE_PROVIDER="google" GEMINI_USE_VERTEX="true" GEMINI_PROJECT_ID="your-gcp-project-id" GEMINI_LOCATION="us-central1" # optional
See Vertex AI docs for more details.
Local Whisper Server
For faster and cheaper transcription, set up a local faster-whisper-server:
-
Configure in
.env
:UTTERTYPE_PROVIDER="openai" OPENAI_BASE_URL="http://localhost:7000/v1"
-
Available local models include:
Systran/faster-whisper-small
(fastest)Systran/faster-distil-whisper-large-v3
(most accurate)deepdml/faster-whisper-large-v3-turbo-ct2
(almost as good, but faster)
Apple Silicon MLX Models
For the fastest local transcription on Apple Silicon Macs (M1/M2/M3):
-
Install the MLX dependencies:
uv sync --extra mlx
-
Configure in
.env
:UTTERTYPE_PROVIDER="mlx" MLX_MODEL_NAME="distil-medium.en" HF_TOKEN="your-huggingface-token" # Get from huggingface.co/join
This uses lightning-whisper-mlx to run Whisper models natively on the Apple Neural Engine.
Available models with speed/accuracy tradeoffs:
Model | Size | Language | Speed | Accuracy |
---|---|---|---|---|
base.en |
Small | English only | ★★★★★ | ★★ |
small.en |
Medium | English only | ★★★★ | ★★★ |
medium.en |
Large | English only | ★★★ | ★★★★ |
distil-small.en |
Medium | English only | ★★★★ | ★★★ |
distil-medium.en |
Large | English only | ★★★ | ★★★★ |
large-v2 |
Extra large | Multilingual | ★★ | ★★★★★ |
macOS Context Screenshot
UtterType can capture screenshots of the active window on macOS for context-aware transcription:
Install the macOS-specific dependencies:
uv sync --extra macos
The screenshot functionality can be useful for providing visual context in context-aware transcription scenarios.
Start the application
Choose one of these methods to run UtterType:
# Standard (recommended)
python -m uttertype.main
# Simple wrapper
python main.py
# If installed in environment
uttertype
# Background with tmux (auto-setup)
./start_uttertype.sh
Required Permissions
When first launching, you'll need to grant these permissions:
macOS:
- System Settings → Privacy & Security → Accessibility
- System Settings → Privacy & Security → Input Monitoring
- Microphone access
Windows/Linux:
- Microphone access permissions
- Press and hold your configured hotkey (globe key on macOS,
<ctrl>+<alt>+v
on other platforms by default) - Speak clearly while holding the key
- Release the key when finished
- Your transcribed text will be inserted at the cursor position
Note: The transcription will be automatically canceled in two cases:
- If the hotkey is pressed and released quickly (less than 300ms by default)
- If another key is pressed while the hotkey is being held down (useful for key combinations like Function+Delete)
Common Issues
- Hotkey not working: Check permissions in System Settings (macOS) or verify hotkey configuration
- No microphone input: Check microphone permissions and default device settings
- API key errors: Verify your API keys are correctly set in the
.env
file - Missing models: For MLX, ensure you've run
uv sync --extra mlx
and provided a HuggingFace token
UtterType is available under the MIT License. See the LICENSE file for details.