Skip to content

divyanshsinghvi/OpenWhisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Listen - Voice-to-Text App

A multi-platform voice-to-text app with intelligent model routing, allowing you to speak instead of typing.

Features

  • 🎙️ System-wide voice recording (Desktop) / One-tap recording (Mobile)
  • 🤖 Multiple SOTA STT models with automatic selection:
    • Moonshine (5-15x faster, optimized for edge devices)
    • Distil-Whisper (6x faster, excellent accuracy)
    • Faster-Whisper, Whisper.cpp, Python Whisper
  • 🧠 Intelligent model routing - Auto-selects best model for your needs
  • 📋 Automatic clipboard copy
  • 🪟 Always-on-top overlay (Desktop)
  • 📱 Native iOS (Swift + WhisperKit) and Android (Kotlin + TFLite) apps
  • 🔒 100% offline - All processing on-device, no cloud services
  • ⚡ Ultra-fast transcription

Setup

  1. Install dependencies:
npm install
  1. Install an STT model (choose one or more):

    🔥 UNDER 1B Parameters (Recommended - Edge-optimized)

    Option A: Parakeet TDT v3 (FASTEST)

    ./install-parakeet.sh  # 600M params, 6.32% WER, 25 languages, ultra-fast inference

    Option B: Moonshine (Mobile-optimized)

    ./install-moonshine.sh  # 40-200M params, 5-15x real-time

    Option C: Distil-Whisper (Best for English)

    ./install-distil-whisper.sh  # 244M params, 6x real-time

    Option D: Faster-Whisper (Good balance)

    pip install faster-whisper  # 74M params, 4x real-time

    Option E: whisper.cpp (C++ implementation)

    ./setup-whisper.sh  # 74M params, 2x real-time

    Option F: Python Whisper (Fallback)

    ./install-python-whisper.sh  # 74M params, baseline

    🎯 OVER 1B Parameters (Optional - Maximum accuracy)

    Option G: Canary Qwen 2.5B (#1 Accuracy)

    ./install-canary.sh  # 2.5B params, 5.63% WER, 

    Note: The app will automatically use the fastest available model. Install multiple models for automatic fallback. Only models you install will be used.

  2. Build and run:

npm run build
npm start

Or run in development mode:

npm run dev

Usage

  1. Press Ctrl+Shift+Space to activate the overlay
  2. Speak your text
  3. Press Ctrl+Shift+Space again to stop recording
  4. The transcribed text will be automatically copied to clipboard
  5. Paste (Ctrl+V) in any application

Keyboard Shortcuts

  • Ctrl+Shift+Space - Start/Stop recording
  • Esc - Cancel recording and close overlay

Model Selection & Routing

Listen uses an intelligent routing system that automatically selects the best available model based on your requirements.

Recommended Models:

  • Desktop (English): Distil-Whisper Small (6x faster, excellent accuracy)
  • Desktop (Multilingual): Moonshine Base (5-15x faster, good accuracy)
  • Mobile (iOS/Android): Moonshine Tiny (ultra-fast, only 40MB)

See MODEL_COMPARISON.md for detailed benchmarks and comparisons.

Platform Support

  • Linux (Desktop - Electron)
  • iOS 16+ (Native Swift app) - See mobile/ios/README.md
  • Android 7+ (Native Kotlin app) - See mobile/android/README.md
  • 🔜 macOS (Desktop - Coming soon)
  • ✅ Windows (Desktop - Initial support)

Project Structure

listen/
├── src/                    # TypeScript source code
│   ├── models/            # STT model implementations
│   ├── assets/            # UI (HTML/CSS)
│   └── main.ts            # Electron entry point
├── scripts/               # Python utility scripts
│   └── record_audio_windows.py
├── docs/                  # Documentation
└── mobile/                # Native iOS & Android apps

See ARCHITECTURE.md for complete structure.

Documentation

Requirements

  • Node.js 18+
  • One of: whisper.cpp, Python whisper, or faster-whisper (local models, no API needed)
  • Audio recording: arecord (ALSA) or sox on Linux

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors