Skip to content

DamianB-BitFlipper/uttertype

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UtterType

Real-time speech-to-text transcription at your fingertips

Watch Demo

UtterType terminal interface

Features

  • Press a hotkey to start voice recording
  • Release to instantly get text transcription
  • Supports multiple transcription providers:
    • OpenAI Whisper API
    • Google Gemini
    • Local Apple Silicon MLX (Mac only)
    • Self-hosted Whisper servers

Setup

1. Install PortAudio/PyAudio

UtterType requires PortAudio to capture audio from your microphone.

macOS - Click to expand

Installing PortAudio on macOS is easiest with Homebrew:

brew install portaudio

Then install PyAudio:

pip install pyaudio
Windows - Click to expand

On Windows, PyAudio typically installs without additional dependencies:

python -m pip install pyaudio
Linux - Click to expand

On Linux, install the system package:

sudo apt-get install python3-pyaudio
### 2. Configure HotKey

UtterType uses a keyboard hotkey that you hold down to record audio and release to transcribe.

  • macOS: Uses the globe key (🌐) by default (bottom left of keyboard)
  • Windows/Linux: Configure in your .env file:
UTTERTYPE_RECORD_HOTKEYS="<ctrl>+<alt>+v"

UtterType uses the pynput library for hotkey functionality. See their documentation for more key combinations.

3. Install Dependencies

Option A: Using uv (Recommended)
  1. Install uv if you haven't already:

  2. Create a virtual environment and install dependencies:

    uv sync

    This will:

    • Create a virtual environment in .venv
    • Install all dependencies from pyproject.toml
    • Install uttertype in development mode
  3. Activate the virtual environment:

    # On Linux/macOS
    source .venv/bin/activate  
    
    # On Windows
    .venv\Scripts\activate
Option B: Using pip

Install in development mode with pip:

pip install -e .
Troubleshooting: Linux GLIBCXX errors

If you see an error like this on Linux:

ImportError: /home/soul/anaconda3/lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found

This is typically caused by Conda environments. See solutions on:

4. Configure Speech Recognition

UtterType supports multiple speech recognition providers:

Provider Description API Key Required Platforms
OpenAI Whisper Cloud-based transcription Yes All
Google Gemini Cloud-based transcription Yes All
Apple MLX Local transcription on Mac No (HF token needed) macOS (M1/M2/M3)
Local Whisper Server Self-hosted server No All

Setup Steps

Step 1: Create .env file (Recommended)
  1. Copy the sample environment file:

    cp .sample_env .env
  2. Edit the .env file to uncomment and configure your chosen provider's section.

See .sample_env for the complete configuration reference.

Step 2: Choose your provider

Set the UTTERTYPE_PROVIDER in your .env file to one of:

# Options: "openai" (default), "mlx" (Mac only), or "google"
UTTERTYPE_PROVIDER="openai"
Alternative: Using Environment Variables

Instead of a .env file, you can set variables directly in your terminal:

OpenAI Whisper (Linux/macOS):

export UTTERTYPE_PROVIDER="openai"
export OPENAI_API_KEY="sk-your-key-here"

Apple Silicon MLX (Mac only):

export UTTERTYPE_PROVIDER="mlx"
export HF_TOKEN="your-huggingface-token"
# Also run: uv sync --extra mlx

Google Gemini (Linux/macOS):

export UTTERTYPE_PROVIDER="google"
export GEMINI_API_KEY="your-api-key-here"

For Windows, use $env: instead of export.

Advanced Configuration Options

Minimum Recording Duration

You can set a minimum duration threshold for recordings to prevent accidental transcriptions from quick hotkey presses:

# Set minimum recording duration to 300ms (default)
UTTERTYPE_MIN_RECORDING_MS=300

This is useful if you want to use the hotkey button for other purposes when pressed quickly (i.e., not held down). Any recording shorter than this duration will be ignored.

Google Vertex AI

For enterprise Google Vertex AI integration:

  1. Install the Google Cloud CLI
  2. Authenticate: gcloud auth application-default login
  3. Configure in .env:
    UTTERTYPE_PROVIDER="google"
    GEMINI_USE_VERTEX="true"
    GEMINI_PROJECT_ID="your-gcp-project-id"
    GEMINI_LOCATION="us-central1"  # optional

See Vertex AI docs for more details.

Local Whisper Server

For faster and cheaper transcription, set up a local faster-whisper-server:

  1. Configure in .env:

    UTTERTYPE_PROVIDER="openai"
    OPENAI_BASE_URL="http://localhost:7000/v1"
  2. Available local models include:

    • Systran/faster-whisper-small (fastest)
    • Systran/faster-distil-whisper-large-v3 (most accurate)
    • deepdml/faster-whisper-large-v3-turbo-ct2 (almost as good, but faster)
Apple Silicon MLX Models

For the fastest local transcription on Apple Silicon Macs (M1/M2/M3):

  1. Install the MLX dependencies:

    uv sync --extra mlx
  2. Configure in .env:

    UTTERTYPE_PROVIDER="mlx"
    MLX_MODEL_NAME="distil-medium.en"
    HF_TOKEN="your-huggingface-token"  # Get from huggingface.co/join

This uses lightning-whisper-mlx to run Whisper models natively on the Apple Neural Engine.

Available models with speed/accuracy tradeoffs:

Model Size Language Speed Accuracy
base.en Small English only ★★★★★ ★★
small.en Medium English only ★★★★ ★★★
medium.en Large English only ★★★ ★★★★
distil-small.en Medium English only ★★★★ ★★★
distil-medium.en Large English only ★★★ ★★★★
large-v2 Extra large Multilingual ★★ ★★★★★
macOS Context Screenshot

UtterType can capture screenshots of the active window on macOS for context-aware transcription:

Install the macOS-specific dependencies:

uv sync --extra macos

The screenshot functionality can be useful for providing visual context in context-aware transcription scenarios.

5. Launch UtterType

Start the application

Choose one of these methods to run UtterType:

# Standard (recommended)
python -m uttertype.main

# Simple wrapper
python main.py

# If installed in environment
uttertype

# Background with tmux (auto-setup)
./start_uttertype.sh
Required Permissions

When first launching, you'll need to grant these permissions:

macOS:

  1. System Settings → Privacy & Security → Accessibility
  2. System Settings → Privacy & Security → Input Monitoring
  3. Microphone access

Windows/Linux:

  • Microphone access permissions

Usage

  1. Press and hold your configured hotkey (globe key on macOS, <ctrl>+<alt>+v on other platforms by default)
  2. Speak clearly while holding the key
  3. Release the key when finished
  4. Your transcribed text will be inserted at the cursor position

Note: The transcription will be automatically canceled in two cases:

  • If the hotkey is pressed and released quickly (less than 300ms by default)
  • If another key is pressed while the hotkey is being held down (useful for key combinations like Function+Delete)

Troubleshooting

Common Issues
  • Hotkey not working: Check permissions in System Settings (macOS) or verify hotkey configuration
  • No microphone input: Check microphone permissions and default device settings
  • API key errors: Verify your API keys are correctly set in the .env file
  • Missing models: For MLX, ensure you've run uv sync --extra mlx and provided a HuggingFace token

License

UtterType is available under the MIT License. See the LICENSE file for details.

About

Short code for dictation using OpenAI Whisper for transcription.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •