A multi-platform voice-to-text app with intelligent model routing, allowing you to speak instead of typing.
- 🎙️ System-wide voice recording (Desktop) / One-tap recording (Mobile)
- 🤖 Multiple SOTA STT models with automatic selection:
- Moonshine (5-15x faster, optimized for edge devices)
- Distil-Whisper (6x faster, excellent accuracy)
- Faster-Whisper, Whisper.cpp, Python Whisper
- 🧠 Intelligent model routing - Auto-selects best model for your needs
- 📋 Automatic clipboard copy
- 🪟 Always-on-top overlay (Desktop)
- 📱 Native iOS (Swift + WhisperKit) and Android (Kotlin + TFLite) apps
- 🔒 100% offline - All processing on-device, no cloud services
- ⚡ Ultra-fast transcription
- Install dependencies:
npm install-
Install an STT model (choose one or more):
Option A: Parakeet TDT v3 (FASTEST)
./install-parakeet.sh # 600M params, 6.32% WER, 25 languages, ultra-fast inferenceOption B: Moonshine (Mobile-optimized)
./install-moonshine.sh # 40-200M params, 5-15x real-timeOption C: Distil-Whisper (Best for English)
./install-distil-whisper.sh # 244M params, 6x real-timeOption D: Faster-Whisper (Good balance)
pip install faster-whisper # 74M params, 4x real-timeOption E: whisper.cpp (C++ implementation)
./setup-whisper.sh # 74M params, 2x real-timeOption F: Python Whisper (Fallback)
./install-python-whisper.sh # 74M params, baselineOption G: Canary Qwen 2.5B (#1 Accuracy)
./install-canary.sh # 2.5B params, 5.63% WER,Note: The app will automatically use the fastest available model. Install multiple models for automatic fallback. Only models you install will be used.
-
Build and run:
npm run build
npm startOr run in development mode:
npm run dev- Press
Ctrl+Shift+Spaceto activate the overlay - Speak your text
- Press
Ctrl+Shift+Spaceagain to stop recording - The transcribed text will be automatically copied to clipboard
- Paste (Ctrl+V) in any application
Ctrl+Shift+Space- Start/Stop recordingEsc- Cancel recording and close overlay
Listen uses an intelligent routing system that automatically selects the best available model based on your requirements.
Recommended Models:
- Desktop (English): Distil-Whisper Small (6x faster, excellent accuracy)
- Desktop (Multilingual): Moonshine Base (5-15x faster, good accuracy)
- Mobile (iOS/Android): Moonshine Tiny (ultra-fast, only 40MB)
See MODEL_COMPARISON.md for detailed benchmarks and comparisons.
- ✅ Linux (Desktop - Electron)
- ✅ iOS 16+ (Native Swift app) - See mobile/ios/README.md
- ✅ Android 7+ (Native Kotlin app) - See mobile/android/README.md
- 🔜 macOS (Desktop - Coming soon)
- ✅ Windows (Desktop - Initial support)
listen/
├── src/ # TypeScript source code
│ ├── models/ # STT model implementations
│ ├── assets/ # UI (HTML/CSS)
│ └── main.ts # Electron entry point
├── scripts/ # Python utility scripts
│ └── record_audio_windows.py
├── docs/ # Documentation
└── mobile/ # Native iOS & Android apps
See ARCHITECTURE.md for complete structure.
- Architecture Overview - System design and modular architecture
- Model Comparison - Detailed STT model benchmarks
- Quick Start Guide - Get up and running in 5 minutes
- iOS README - iOS app documentation
- Android README - Android app documentation
- Node.js 18+
- One of: whisper.cpp, Python whisper, or faster-whisper (local models, no API needed)
- Audio recording:
arecord(ALSA) orsoxon Linux
MIT