Skip to content

๐ŸŽค Fast, secure push-to-talk voice typing with beautiful system tray integration

License

Notifications You must be signed in to change notification settings

julio50/voice-typing-ptt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

37 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽค Voice Typing - Push-to-Talk Speech Recognition

A fast, secure, and elegant voice typing solution with push-to-talk functionality. Perfect for hands-free text input with real-time transcription.

Voice Typing Demo Platform Debian License

โœจ Features

๐ŸŽฏ Push-to-Talk Interface

  • Hotkey: Hold Left Shift + Left Alt to record
  • Instant Transcription: Fast, local processing with whisper.cpp
  • Smart Filtering: Automatic removal of false positives and noise
  • Real-time Feedback: Beautiful status indicators

๐Ÿ–ฅ๏ธ System Tray Integration

  • Beautiful Icons: Microphone icons that change color by status
  • One-Click Control: Simple start/pause toggle
  • Auto-startup: Launches automatically on login
  • KDE Compatible: Optimized for KDE Plasma desktop

๐Ÿ›ก๏ธ Security & Privacy

  • Local Processing: All transcription happens locally
  • Input Sanitization: Secure text filtering and validation
  • Configurable Endpoints: No hardcoded server addresses
  • Minimal Permissions: Runs with standard user privileges

โšก Performance

  • Lightning Fast: Types almost immediately after speaking
  • High Quality Audio: 44.1kHz WAV recording for best accuracy
  • Resource Efficient: Minimal CPU and memory usage
  • Reliable: Comprehensive error handling and recovery

๐Ÿš€ Quick Start

Prerequisites

Tested on Debian 12 (Bookworm) - Should work on most modern Linux distributions.

# Install system dependencies
sudo apt install sox xinput python3-venv

# Install ydotool for typing simulation
sudo apt install ydotool

# Start ydotool daemon
sudo systemctl enable --now ydotoold

Installation

  1. Clone the repository

    git clone <repository-url>
    cd voice_typing
  2. Set up Python environment

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
  3. Configure your setup

    cp config.env.example config.env
    # Edit config.env with your whisper server details
  4. Install system tray (optional)

    cp voice-typing-tray.desktop ~/.config/autostart/
    cp voice-typing-tray.desktop ~/.local/share/applications/

Whisper Server Setup

You need a running whisper.cpp server. Quick setup:

# Clone and build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make server

# Download a model
./models/download-ggml-model.sh base.en

# Start the server
./server -m models/ggml-base.en.bin -p 8080

๐ŸŽฎ Usage

Command Line

./voice_client_ptt

System Tray

./voice_tray_qt.py
  • Right-click the tray icon to start/pause
  • Green: Ready for input
  • Red: Recording in progress
  • Blue: Processing transcription
  • Gray: Service stopped

Recording Voice

  1. Hold Left Shift + Left Alt
  2. Speak clearly
  3. Release keys to transcribe
  4. Text appears instantly where your cursor is

โš™๏ธ Configuration

Edit config.env to customize:

# Server Configuration
WHISPER_SERVER=http://localhost:8080

# Audio Settings
AUDIO_SAMPLE_RATE=44100
AUDIO_FORMAT=wav

# Keyboard Settings
KEYBOARD_NAME="Dell KB216 Wired Keyboard"
HOTKEY_1=50  # Left Shift
HOTKEY_2=64  # Left Alt

# Filtering
MIN_WORD_COUNT=2
MIN_CHAR_COUNT=6

๐Ÿ“ Project Structure

voice_typing/
โ”œโ”€โ”€ voice_client_ptt          # Main push-to-talk script
โ”œโ”€โ”€ voice_tray_qt.py          # System tray application
โ”œโ”€โ”€ voice-typing-tray.desktop # Desktop entry for auto-start
โ”œโ”€โ”€ config.env.example        # Configuration template
โ”œโ”€โ”€ icons/                    # System tray icons
โ”‚   โ”œโ”€โ”€ tray_icon_ready.png
โ”‚   โ”œโ”€โ”€ tray_icon_recording.png
โ”‚   โ”œโ”€โ”€ tray_icon_processing.png
โ”‚   โ””โ”€โ”€ tray_icon_stopped.png
โ”œโ”€โ”€ utils/                    # Utility scripts
โ””โ”€โ”€ Old/                      # Legacy versions

๐Ÿ”ง Troubleshooting

Common Issues

"No keyboard device found"

  • Update KEYBOARD_NAME in config.env
  • List available keyboards: xinput list | grep -i keyboard

"Connection refused"

  • Ensure whisper.cpp server is running
  • Check WHISPER_SERVER URL in config.env
  • Test with: curl http://localhost:8080/health

"ydotool not working"

  • Start the daemon: sudo systemctl start ydotoold
  • Add user to input group: sudo usermod -a -G input $USER

"System tray not showing"

  • KDE: Enable system tray in panel settings
  • Install Qt5: sudo apt install python3-pyqt5

๐Ÿค Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • whisper.cpp - Fast local speech recognition
  • ydotool - Wayland-compatible input simulation
  • PyQt5 - Cross-platform GUI toolkit
  • sox - Audio processing utilities

Made with โค๏ธ for the open source community

Fast โ€ข Secure โ€ข Private โ€ข Open Source

About

๐ŸŽค Fast, secure push-to-talk voice typing with beautiful system tray integration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors