UtterType

Real-time speech-to-text transcription at your fingertips

Features

Press a hotkey to start voice recording
Release to instantly get text transcription
Supports multiple transcription providers:
- OpenAI Whisper API
- Google Gemini
- Local Apple Silicon MLX (Mac only)
- Self-hosted Whisper servers

Setup

1. Install PortAudio/PyAudio

UtterType requires PortAudio to capture audio from your microphone.

macOS - Click to expand

Installing PortAudio on macOS is easiest with Homebrew:

brew install portaudio

Then install PyAudio:

pip install pyaudio

Windows - Click to expand

On Windows, PyAudio typically installs without additional dependencies:

python -m pip install pyaudio

Linux - Click to expand

On Linux, install the system package:

sudo apt-get install python3-pyaudio

### 2. Configure HotKey

UtterType uses a keyboard hotkey that you hold down to record audio and release to transcribe.

macOS: Uses the globe key (🌐) by default (bottom left of keyboard)
Windows/Linux: Configure in your .env file:

UTTERTYPE_RECORD_HOTKEYS="<ctrl>+<alt>+v"

UtterType uses the pynput library for hotkey functionality. See their documentation for more key combinations.

3. Install Dependencies

Option A: Using uv (Recommended)

Install uv if you haven't already:
- Follow the uv installation documentation
Create a virtual environment and install dependencies:
```
uv sync
```
This will:
- Create a virtual environment in .venv
- Install all dependencies from pyproject.toml
- Install uttertype in development mode

Activate the virtual environment:

# On Linux/macOS
source .venv/bin/activate  

# On Windows
.venv\Scripts\activate

Option B: Using pip

Install in development mode with pip:

pip install -e .

Troubleshooting: Linux GLIBCXX errors

If you see an error like this on Linux:

ImportError: /home/soul/anaconda3/lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found

This is typically caused by Conda environments. See solutions on:

4. Configure Speech Recognition

UtterType supports multiple speech recognition providers:

Provider	Description	API Key Required	Platforms
OpenAI Whisper	Cloud-based transcription	Yes	All
Google Gemini	Cloud-based transcription	Yes	All
Apple MLX	Local transcription on Mac	No (HF token needed)	macOS (M1/M2/M3)
Local Whisper Server	Self-hosted server	No	All

Setup Steps

Step 1: Create .env file (Recommended)

Copy the sample environment file:
```
cp .sample_env .env
```
Edit the .env file to uncomment and configure your chosen provider's section.

See .sample_env for the complete configuration reference.

Step 2: Choose your provider

Set the UTTERTYPE_PROVIDER in your .env file to one of:

# Options: "openai" (default), "mlx" (Mac only), or "google"
UTTERTYPE_PROVIDER="openai"

Alternative: Using Environment Variables

Instead of a .env file, you can set variables directly in your terminal:

OpenAI Whisper (Linux/macOS):

export UTTERTYPE_PROVIDER="openai"
export OPENAI_API_KEY="sk-your-key-here"

Apple Silicon MLX (Mac only):

export UTTERTYPE_PROVIDER="mlx"
export HF_TOKEN="your-huggingface-token"
# Also run: uv sync --extra mlx

Google Gemini (Linux/macOS):

export UTTERTYPE_PROVIDER="google"
export GEMINI_API_KEY="your-api-key-here"

For Windows, use $env: instead of export.

Advanced Configuration Options

Minimum Recording Duration

You can set a minimum duration threshold for recordings to prevent accidental transcriptions from quick hotkey presses:

# Set minimum recording duration to 300ms (default)
UTTERTYPE_MIN_RECORDING_MS=300

This is useful if you want to use the hotkey button for other purposes when pressed quickly (i.e., not held down). Any recording shorter than this duration will be ignored.

Google Vertex AI

For enterprise Google Vertex AI integration:

Install the Google Cloud CLI
Authenticate: gcloud auth application-default login

Configure in .env:

UTTERTYPE_PROVIDER="google"
GEMINI_USE_VERTEX="true"
GEMINI_PROJECT_ID="your-gcp-project-id"
GEMINI_LOCATION="us-central1"  # optional

See Vertex AI docs for more details.

Local Whisper Server

For faster and cheaper transcription, set up a local faster-whisper-server:

Configure in .env:

UTTERTYPE_PROVIDER="openai"
OPENAI_BASE_URL="http://localhost:7000/v1"

Available local models include:
- Systran/faster-whisper-small (fastest)
- Systran/faster-distil-whisper-large-v3 (most accurate)
- deepdml/faster-whisper-large-v3-turbo-ct2 (almost as good, but faster)

Apple Silicon MLX Models

For the fastest local transcription on Apple Silicon Macs (M1/M2/M3):

Install the MLX dependencies:
```
uv sync --extra mlx
```

Configure in .env:

UTTERTYPE_PROVIDER="mlx"
MLX_MODEL_NAME="distil-medium.en"
HF_TOKEN="your-huggingface-token"  # Get from huggingface.co/join

This uses lightning-whisper-mlx to run Whisper models natively on the Apple Neural Engine.

Available models with speed/accuracy tradeoffs:

Model	Size	Language	Speed	Accuracy
`base.en`	Small	English only	★★★★★	★★
`small.en`	Medium	English only	★★★★	★★★
`medium.en`	Large	English only	★★★	★★★★
`distil-small.en`	Medium	English only	★★★★	★★★
`distil-medium.en`	Large	English only	★★★	★★★★
`large-v2`	Extra large	Multilingual	★★	★★★★★

macOS Context Screenshot

UtterType can capture screenshots of the active window on macOS for context-aware transcription:

Install the macOS-specific dependencies:

uv sync --extra macos

The screenshot functionality can be useful for providing visual context in context-aware transcription scenarios.

5. Launch UtterType

Start the application

Choose one of these methods to run UtterType:

# Standard (recommended)
python -m uttertype.main

# Simple wrapper
python main.py

# If installed in environment
uttertype

# Background with tmux (auto-setup)
./start_uttertype.sh

Required Permissions

When first launching, you'll need to grant these permissions:

macOS:

System Settings → Privacy & Security → Accessibility
System Settings → Privacy & Security → Input Monitoring
Microphone access

Windows/Linux:

Microphone access permissions

Usage

Press and hold your configured hotkey (globe key on macOS, <ctrl>+<alt>+v on other platforms by default)
Speak clearly while holding the key
Release the key when finished
Your transcribed text will be inserted at the cursor position

Note: The transcription will be automatically canceled in two cases:

If the hotkey is pressed and released quickly (less than 300ms by default)
If another key is pressed while the hotkey is being held down (useful for key combinations like Function+Delete)

Troubleshooting

Common Issues

Hotkey not working: Check permissions in System Settings (macOS) or verify hotkey configuration
No microphone input: Check microphone permissions and default device settings
API key errors: Verify your API keys are correctly set in the .env file
Missing models: For MLX, ensure you've run uv sync --extra mlx and provided a HuggingFace token

License

UtterType is available under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
assets		assets
uttertype		uttertype
.gitignore		.gitignore
.sample_env		.sample_env
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
start_uttertype.sh		start_uttertype.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UtterType

Features

Setup

1. Install PortAudio/PyAudio

3. Install Dependencies

4. Configure Speech Recognition

Setup Steps

Advanced Configuration Options

5. Launch UtterType

Usage

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

DamianB-BitFlipper/uttertype

Folders and files

Latest commit

History

Repository files navigation

UtterType

Features

Setup

1. Install PortAudio/PyAudio

3. Install Dependencies

4. Configure Speech Recognition

Setup Steps

Advanced Configuration Options

5. Launch UtterType

Usage

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages