TalkType

Push-to-talk voice typing for your terminal.

Press a hotkey, speak, press again — your words appear wherever you're typing. Works with any terminal, IDE, or text field. Local transcription using Whisper, no cloud services required.

Why TalkType?

When you type, you self-edit and truncate. When you speak, you explain naturally and fully. TalkType bridges that gap — letting you talk to your terminal, your AI assistant, or any app, and have your words appear instantly.

Built for developers who want:

Voice input for CLI tools like Claude Code, aider, or any terminal app
System-wide dictation that works anywhere — terminals, IDEs, browsers
Local, private transcription — your voice never leaves your machine
Minimal latency — GPU-accelerated transcription in under a second

Features

Push-to-talk: Press F9 to start, speak, press F9 to stop and paste
Works everywhere: Browsers, IDEs, terminals — any text field that accepts paste
Cross-platform: Linux, Windows, macOS
Local Whisper: Uses faster-whisper for fast, private transcription
API mode: Connect to any Whisper-compatible API server
Smart paste: Auto-detects terminals vs other apps (Ctrl+Shift+V vs Ctrl+V)
Window focus: Remembers where you started — switch apps while speaking
Configurable: Choose your hotkey, model size, and language

Installation

Quick Install (Linux)

git clone https://github.com/lmacan1/talktype.git && cd talktype
sudo apt install xdotool xclip portaudio19-dev
python3 -m venv venv && source venv/bin/activate
pip install -e .
talktype  # Setup wizard launches automatically

Manual Install - Linux (Ubuntu/Debian)

# System dependencies
sudo apt install xdotool xclip portaudio19-dev

# Clone and install
git clone https://github.com/lmacan1/talktype.git
cd talktype
python3 -m venv venv
source venv/bin/activate
pip install -e .  # Installs 'talktype' command

Windows

git clone https://github.com/lmacan1/talktype.git
cd talktype
python -m venv venv
.\venv\Scripts\activate
pip install -e .
talktype  # Setup wizard launches automatically

macOS

brew install portaudio
git clone https://github.com/lmacan1/talktype.git
cd talktype
python3 -m venv venv && source venv/bin/activate
pip install -e .
talktype  # Setup wizard launches automatically

Usage

First Run — Setup Wizard

On first launch, TalkType runs an interactive setup wizard:

talktype

The wizard lets you:

Choose transcription mode (local server, cloud API, or local model)
Set hotkeys by pressing them (not typing)
Select Whisper model and language
Optionally install as a system service (runs on login)

Config is saved to ~/.config/talktype/config.yaml. Re-run anytime with talktype --setup.

Basic Usage

talktype  # Uses saved config

Press F9 to start recording (beep)
Speak your text
Press F9 again to stop and paste (beep)
Your words appear in the focused window

Recovery Hotkeys

Key	What it does
F9	Record / Stop & Paste
F8	Re-paste last transcription (if paste failed)
F7	Retry transcription (if API timed out)

Options

# Use a different model (tiny, base, small, medium, large-v3)
python talktype.py --model small

# Use a different hotkey
python talktype.py --hotkey f8

# Connect to a Whisper API server (if you have one running)
python talktype.py --api http://localhost:8002/transcribe

# Change language
python talktype.py --language es  # Spanish

OpenAI-Compatible APIs

TalkType supports any OpenAI-compatible transcription API, so you can use different backends like Whisper, Parakeet, or Whisper Turbo:

# OpenAI API
python talktype.py --api https://api.openai.com/v1/audio/transcriptions --api-model whisper-1

# Groq (super fast)
python talktype.py --api https://api.groq.com/openai/v1/audio/transcriptions --api-model whisper-large-v3

# Local OpenAI-compatible server (e.g., faster-whisper-server, whisper.cpp)
python talktype.py --api http://localhost:8080/v1/audio/transcriptions --api-model whisper-1

# Any custom server
python talktype.py --api http://localhost:8002/transcribe

TalkType auto-detects OpenAI-compatible endpoints by URL pattern. For custom servers, it uses a simpler format that works with most Whisper APIs.

Model Sizes

Model	Size	Speed	Accuracy	VRAM
tiny	~75MB	Fastest	Basic	~1GB
base	~150MB	Fast	Good	~1GB
small	~500MB	Medium	Better	~2GB
medium	~1.5GB	Slow	Great	~5GB
large-v3	~3GB	Slowest	Best	~10GB

For most use cases, base or small is the sweet spot.

Whisper API Server (Recommended for Power Users)

For faster startup and better performance, run the included Whisper API server. The model stays loaded in memory, so TalkType connects instantly.

Why use the server?

Mode	Startup	Memory	Best for
Direct (`talktype.py`)	~3-5s (loads model)	Uses RAM while running	Occasional use
Server (`whisper_server.py`)	Instant	Server keeps model loaded	Heavy use, multiple apps

Running the Server

Terminal 1 - Start the server (once):

source venv/bin/activate
python whisper_server.py --model base

# Or with GPU and larger model:
python whisper_server.py --model large-v3 --device cuda

Terminal 2 - Run TalkType:

source venv/bin/activate
python talktype.py --api http://localhost:8002/transcribe

Server Options

python whisper_server.py --help

# Examples:
python whisper_server.py --model small        # Better accuracy
python whisper_server.py --port 8080          # Different port
python whisper_server.py --device cpu         # Force CPU
python whisper_server.py --device cuda        # Force GPU

# Environment variables also work:
WHISPER_MODEL=large-v3 WHISPER_DEVICE=cuda python whisper_server.py

Running Server as a Service (Linux)

cat > ~/.config/systemd/user/whisper-server.service << 'EOF'
[Unit]
Description=Whisper API Server
After=network.target

[Service]
Type=simple
WorkingDirectory=/path/to/talktype
ExecStart=/path/to/talktype/venv/bin/python whisper_server.py --model base
Restart=on-failure
RestartSec=5

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable whisper-server
systemctl --user start whisper-server

API Endpoints

The server exposes:

Endpoint	Method	Description
`/health`	GET	Check server status
`/transcribe`	POST	Transcribe audio file
`/docs`	GET	Interactive web UI — test transcription right in your browser

Example with curl:

curl -X POST http://localhost:8002/transcribe \
  -F "file=@audio.wav" \
  -F "language=en"

Running as a Service (Linux)

The setup wizard can install TalkType as a systemd service automatically — just select "Run at startup" when prompted.

Or install manually:

# Create systemd user service
mkdir -p ~/.config/systemd/user

cat > ~/.config/systemd/user/talktype.service << 'EOF'
[Unit]
Description=TalkType Voice Dictation
After=graphical-session.target

[Service]
Type=simple
ExecStart=/path/to/talktype/venv/bin/talktype
Restart=on-failure
RestartSec=5
Environment=DISPLAY=:0

[Install]
WantedBy=default.target
EOF

# Enable and start
systemctl --user daemon-reload
systemctl --user enable talktype
systemctl --user start talktype

Manage with:

systemctl --user status talktype   # Check status
systemctl --user stop talktype     # Stop
systemctl --user restart talktype  # Restart

Using with Claude Code

TalkType works seamlessly with Claude Code and similar terminal AI assistants:

Start TalkType in a separate terminal (or as a service)
Focus your Claude Code terminal
Press F9, describe what you want, press F9
Your detailed voice prompt appears in Claude Code

Voice lets you elaborate naturally without self-editing — often resulting in clearer, more detailed prompts.

Using with Browsers

TalkType works in any browser text field — it's not just for terminals:

Focus a text field (Google Docs, ChatGPT, Slack, email composer, etc.)
Press F9, speak, press F9
Your words appear in the browser

Since TalkType uses clipboard + standard paste (Ctrl+V / Cmd+V), it works anywhere that accepts pasted text.

Troubleshooting

Linux: "No module named 'pynput'"

Make sure you activated the virtual environment: source venv/bin/activate

Linux: Hotkey not working

pynput requires X11. If using Wayland, either:

Switch to X11 session
Run with GDK_BACKEND=x11 environment variable

macOS: Accessibility permissions

macOS requires accessibility permissions for keyboard monitoring:

Go to System Preferences → Security & Privacy → Privacy → Accessibility
Add your terminal app (Terminal, iTerm, etc.)

Windows: No audio input

Make sure your microphone is set as the default input device in Windows Sound settings.

Transcription is slow

Try a smaller model: --model tiny or --model base
If you have an NVIDIA GPU, ensure CUDA is installed for GPU acceleration
Consider running a separate Whisper API server and using --api

How It Works

Global hotkey capture (pynput) — works even when other apps are focused
Audio recording (sounddevice) — captures from your microphone
Local transcription (faster-whisper) — Whisper running on your machine
Smart paste (pyperclip + OS-specific) — detects terminal vs other apps

[F9 Press] → Start Recording → [Speak] → [F9 Press] → Stop Recording
                                                            ↓
                                                    Transcribe (Whisper)
                                                            ↓
                                                    Focus Original Window
                                                            ↓
                                                    Paste Text

Contributing

Contributions welcome! Some ideas:

Voice activity detection (auto-stop on silence)
Wayland support (wtype instead of xdotool)
Tray icon / visual indicator
Custom vocabulary/prompts
Streaming transcription

License

MIT License — see LICENSE for details.

Acknowledgments

faster-whisper — CTranslate2-based Whisper
OpenAI Whisper — the model itself
pynput — cross-platform input monitoring

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_wizard.py		setup_wizard.py
talktype.py		talktype.py
whisper_server.py		whisper_server.py

Folders and files

Latest commit

History

Repository files navigation

TalkType

Why TalkType?

Features

Installation

Quick Install (Linux)

Manual Install - Linux (Ubuntu/Debian)

Windows

macOS

Usage

First Run — Setup Wizard

Basic Usage

Recovery Hotkeys

Options

OpenAI-Compatible APIs

Model Sizes

Whisper API Server (Recommended for Power Users)

Why use the server?

Running the Server

Server Options

Running Server as a Service (Linux)

API Endpoints

Running as a Service (Linux)

Using with Claude Code

Using with Browsers

Troubleshooting

Linux: "No module named 'pynput'"

Linux: Hotkey not working

macOS: Accessibility permissions

Windows: No audio input

Transcription is slow

How It Works

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Languages

Packages