Push-to-talk voice typing for your terminal.
Press a hotkey, speak, press again — your words appear wherever you're typing. Works with any terminal, IDE, or text field. Local transcription using Whisper, no cloud services required.
When you type, you self-edit and truncate. When you speak, you explain naturally and fully. TalkType bridges that gap — letting you talk to your terminal, your AI assistant, or any app, and have your words appear instantly.
Built for developers who want:
- Voice input for CLI tools like Claude Code, aider, or any terminal app
- System-wide dictation that works anywhere — terminals, IDEs, browsers
- Local, private transcription — your voice never leaves your machine
- Minimal latency — GPU-accelerated transcription in under a second
- Push-to-talk: Press F9 to start, speak, press F9 to stop and paste
- Works everywhere: Browsers, IDEs, terminals — any text field that accepts paste
- Cross-platform: Linux, Windows, macOS
- Local Whisper: Uses faster-whisper for fast, private transcription
- API mode: Connect to any Whisper-compatible API server
- Smart paste: Auto-detects terminals vs other apps (Ctrl+Shift+V vs Ctrl+V)
- Window focus: Remembers where you started — switch apps while speaking
- Configurable: Choose your hotkey, model size, and language
git clone https://github.com/lmacan1/talktype.git && cd talktype
sudo apt install xdotool xclip portaudio19-dev
python3 -m venv venv && source venv/bin/activate
pip install -e .
talktype # Setup wizard launches automatically# System dependencies
sudo apt install xdotool xclip portaudio19-dev
# Clone and install
git clone https://github.com/lmacan1/talktype.git
cd talktype
python3 -m venv venv
source venv/bin/activate
pip install -e . # Installs 'talktype' commandgit clone https://github.com/lmacan1/talktype.git
cd talktype
python -m venv venv
.\venv\Scripts\activate
pip install -e .
talktype # Setup wizard launches automaticallybrew install portaudio
git clone https://github.com/lmacan1/talktype.git
cd talktype
python3 -m venv venv && source venv/bin/activate
pip install -e .
talktype # Setup wizard launches automaticallyOn first launch, TalkType runs an interactive setup wizard:
talktypeThe wizard lets you:
- Choose transcription mode (local server, cloud API, or local model)
- Set hotkeys by pressing them (not typing)
- Select Whisper model and language
- Optionally install as a system service (runs on login)
Config is saved to ~/.config/talktype/config.yaml. Re-run anytime with talktype --setup.
talktype # Uses saved config- Press F9 to start recording (beep)
- Speak your text
- Press F9 again to stop and paste (beep)
- Your words appear in the focused window
| Key | What it does |
|---|---|
| F9 | Record / Stop & Paste |
| F8 | Re-paste last transcription (if paste failed) |
| F7 | Retry transcription (if API timed out) |
# Use a different model (tiny, base, small, medium, large-v3)
python talktype.py --model small
# Use a different hotkey
python talktype.py --hotkey f8
# Connect to a Whisper API server (if you have one running)
python talktype.py --api http://localhost:8002/transcribe
# Change language
python talktype.py --language es # SpanishTalkType supports any OpenAI-compatible transcription API, so you can use different backends like Whisper, Parakeet, or Whisper Turbo:
# OpenAI API
python talktype.py --api https://api.openai.com/v1/audio/transcriptions --api-model whisper-1
# Groq (super fast)
python talktype.py --api https://api.groq.com/openai/v1/audio/transcriptions --api-model whisper-large-v3
# Local OpenAI-compatible server (e.g., faster-whisper-server, whisper.cpp)
python talktype.py --api http://localhost:8080/v1/audio/transcriptions --api-model whisper-1
# Any custom server
python talktype.py --api http://localhost:8002/transcribeTalkType auto-detects OpenAI-compatible endpoints by URL pattern. For custom servers, it uses a simpler format that works with most Whisper APIs.
| Model | Size | Speed | Accuracy | VRAM |
|---|---|---|---|---|
| tiny | ~75MB | Fastest | Basic | ~1GB |
| base | ~150MB | Fast | Good | ~1GB |
| small | ~500MB | Medium | Better | ~2GB |
| medium | ~1.5GB | Slow | Great | ~5GB |
| large-v3 | ~3GB | Slowest | Best | ~10GB |
For most use cases, base or small is the sweet spot.
For faster startup and better performance, run the included Whisper API server. The model stays loaded in memory, so TalkType connects instantly.
| Mode | Startup | Memory | Best for |
|---|---|---|---|
Direct (talktype.py) |
~3-5s (loads model) | Uses RAM while running | Occasional use |
Server (whisper_server.py) |
Instant | Server keeps model loaded | Heavy use, multiple apps |
Terminal 1 - Start the server (once):
source venv/bin/activate
python whisper_server.py --model base
# Or with GPU and larger model:
python whisper_server.py --model large-v3 --device cudaTerminal 2 - Run TalkType:
source venv/bin/activate
python talktype.py --api http://localhost:8002/transcribepython whisper_server.py --help
# Examples:
python whisper_server.py --model small # Better accuracy
python whisper_server.py --port 8080 # Different port
python whisper_server.py --device cpu # Force CPU
python whisper_server.py --device cuda # Force GPU
# Environment variables also work:
WHISPER_MODEL=large-v3 WHISPER_DEVICE=cuda python whisper_server.pycat > ~/.config/systemd/user/whisper-server.service << 'EOF'
[Unit]
Description=Whisper API Server
After=network.target
[Service]
Type=simple
WorkingDirectory=/path/to/talktype
ExecStart=/path/to/talktype/venv/bin/python whisper_server.py --model base
Restart=on-failure
RestartSec=5
[Install]
WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable whisper-server
systemctl --user start whisper-serverThe server exposes:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Check server status |
/transcribe |
POST | Transcribe audio file |
/docs |
GET | Interactive web UI — test transcription right in your browser |
Example with curl:
curl -X POST http://localhost:8002/transcribe \
-F "file=@audio.wav" \
-F "language=en"The setup wizard can install TalkType as a systemd service automatically — just select "Run at startup" when prompted.
Or install manually:
# Create systemd user service
mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/talktype.service << 'EOF'
[Unit]
Description=TalkType Voice Dictation
After=graphical-session.target
[Service]
Type=simple
ExecStart=/path/to/talktype/venv/bin/talktype
Restart=on-failure
RestartSec=5
Environment=DISPLAY=:0
[Install]
WantedBy=default.target
EOF
# Enable and start
systemctl --user daemon-reload
systemctl --user enable talktype
systemctl --user start talktypeManage with:
systemctl --user status talktype # Check status
systemctl --user stop talktype # Stop
systemctl --user restart talktype # RestartTalkType works seamlessly with Claude Code and similar terminal AI assistants:
- Start TalkType in a separate terminal (or as a service)
- Focus your Claude Code terminal
- Press F9, describe what you want, press F9
- Your detailed voice prompt appears in Claude Code
Voice lets you elaborate naturally without self-editing — often resulting in clearer, more detailed prompts.
TalkType works in any browser text field — it's not just for terminals:
- Focus a text field (Google Docs, ChatGPT, Slack, email composer, etc.)
- Press F9, speak, press F9
- Your words appear in the browser
Since TalkType uses clipboard + standard paste (Ctrl+V / Cmd+V), it works anywhere that accepts pasted text.
Make sure you activated the virtual environment: source venv/bin/activate
pynput requires X11. If using Wayland, either:
- Switch to X11 session
- Run with
GDK_BACKEND=x11environment variable
macOS requires accessibility permissions for keyboard monitoring:
- Go to System Preferences → Security & Privacy → Privacy → Accessibility
- Add your terminal app (Terminal, iTerm, etc.)
Make sure your microphone is set as the default input device in Windows Sound settings.
- Try a smaller model:
--model tinyor--model base - If you have an NVIDIA GPU, ensure CUDA is installed for GPU acceleration
- Consider running a separate Whisper API server and using
--api
- Global hotkey capture (pynput) — works even when other apps are focused
- Audio recording (sounddevice) — captures from your microphone
- Local transcription (faster-whisper) — Whisper running on your machine
- Smart paste (pyperclip + OS-specific) — detects terminal vs other apps
[F9 Press] → Start Recording → [Speak] → [F9 Press] → Stop Recording
↓
Transcribe (Whisper)
↓
Focus Original Window
↓
Paste Text
Contributions welcome! Some ideas:
- Voice activity detection (auto-stop on silence)
- Wayland support (wtype instead of xdotool)
- Tray icon / visual indicator
- Custom vocabulary/prompts
- Streaming transcription
MIT License — see LICENSE for details.
- faster-whisper — CTranslate2-based Whisper
- OpenAI Whisper — the model itself
- pynput — cross-platform input monitoring
