Skip to content

HYP3R00T/voicepad

Repository files navigation

VoicePad

Voice recording and GPU-accelerated transcription tool. Effortlessly capture and transcribe audio with intelligent defaults and GPU support.

Features

  • πŸŽ™οΈ Simple Recording: Record audio with one command, press Ctrl+C to stop
  • πŸš€ GPU-Accelerated Transcription: Fast transcription using faster-whisper with optional CUDA support
  • βš™οΈ Smart Configuration: Set defaults once, use them everywhere
  • πŸ“ Automatic Output: Saves both audio and markdown transcriptions to configured directories
  • 🎯 Auto Model Selection: System diagnostics recommend optimal model for your hardware
  • πŸ”§ System Inspection: Check GPU availability, supported models, and system capabilities

This is a two-package architecture:

Installation

Using uvx (Recommended)

Run VoicePad instantly without manual installation:

# List audio input devices and set a default
uvx voicepad config input

# Check system capabilities and get model recommendations
uvx voicepad config system

# Start recording (output directory created automatically)
uvx voicepad record start

Local Installation

For development or local use:

git clone https://github.com/HYP3R00T/voicepad.git
cd voicepad
uv sync
uv run voicepad config input

System Requirements

  • Python: 3.13+
  • Audio Device: Microphone or audio input device
  • GPU (Optional): NVIDIA GPU for 4-5x faster transcription

Quick Start

Record Audio

# Start recording (press Ctrl+C to stop)
voicepad record start

# Display current recording configuration
voicepad record info

# Record with auto-transcription disabled
voicepad record start --no-transcribe

# Record for a fixed duration (seconds)
voicepad record start --duration 30

# Use a custom filename prefix
voicepad record start --prefix my_recording

Configure Input Device

# List available audio input devices
voicepad config input

# Edit voicepad.yaml to set a default input device:
# input_device_index: 2

Check System Capabilities

# Display RAM, CPU, and GPU information
voicepad config system

# Get model recommendations based on your system
voicepad config recommend

# View current transcription configuration
voicepad config transcription

# List all available Whisper models
voicepad config models

Configuration

Configuration is managed through voicepad.yaml in the working directory or ~/.config/voicepad/voicepad.yaml globally.

Configuration File Structure

# Paths for saving recordings and transcriptions
recordings_path: data/recordings
markdown_path: data/markdown

# Audio device (null for default system audio input)
input_device_index: null

# Filename prefix for recordings
recording_prefix: recording

# Transcription settings
transcription_model: tiny              # See available models below
transcription_device: auto             # auto, cuda, or cpu
transcription_compute_type: auto       # auto, float16, int8, or float32

Configuration Precedence (highest to lowest)

  1. CLI command arguments
  2. Environment variables (e.g., VOICEPAD_TRANSCRIPTION_MODEL=small)
  3. Project config (./voicepad.yaml)
  4. Global config (~/.config/voicepad/voicepad.yaml)
  5. Built-in defaults

GPU Acceleration

Install GPU support for 4-5x faster transcription:

# Install with GPU support
pip install voicepad-core[gpu]

# Verify GPU is available
voicepad config system

See the GPU Acceleration Guide for detailed setup instructions.

Available Whisper Models

Use voicepad config models to list available models. Smaller models are faster but less accurate; larger models are slower but more accurate.

Model Size Speed (CPU) Accuracy VRAM (GPU) Language
tiny 39M ⚑⚑⚑⚑⚑ Very Fast ⭐ Low <1 GB Multi
base 74M ⚑⚑⚑⚑ Fast ⭐⭐ <1 GB Multi
small 244M ⚑⚑⚑ Moderate ⭐⭐⭐ 1-2 GB Multi
medium 769M ⚑⚑ Slow ⭐⭐⭐⭐ 2-3 GB Multi
large-v2 1.5B ⚑ Very Slow ⭐⭐⭐⭐⭐ Excellent ~4.7 GB Multi
large-v3 1.5B ⚑ Very Slow ⭐⭐⭐⭐⭐ Excellent ~4.7 GB Multi
turbo 809M ⚑⚑⚑ Moderate ⭐⭐⭐⭐⭐ 3-4 GB Multi
distil-small.en 134M ⚑⚑⚑ Moderate ⭐⭐⭐ <1 GB English Only
distil-medium.en 394M ⚑⚑ Slow ⭐⭐⭐⭐ 1-2 GB English Only
distil-large-v2 756M ⚑⚑ Slow ⭐⭐⭐⭐⭐ 3-4 GB English Only

Tip: Run voicepad config recommend to get a model recommendation based on your system resources.

Package Documentation

Development

Code Standards

Follow the project coding standards:

  • Naming: snake_case for functions/variables, PascalCase for classes
  • Type Hints: Required for all functions and class attributes
  • Formatting: PEP 8
  • Validation: Use Pydantic for data models

See .github/copilot-instructions.md for complete guidelines.

Formatting and Linting

# Format all code
ruff format

# Check for linting issues
ruff check

# Type check Python code
ty check

Project Structure

voicepad (root)
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ voicepad/                      # CLI package (Typer)
β”‚   β”‚   └── src/voicepad/
β”‚   β”‚       β”œβ”€β”€ cli/
β”‚   β”‚       β”‚   β”œβ”€β”€ record.py          # record start, record info
β”‚   β”‚       β”‚   └── config.py          # config input, system, recommend, etc.
β”‚   β”‚       └── __main__.py            # CLI entry point
β”‚   β”‚
β”‚   └── voicepad-core/                 # Core library (Pydantic + faster-whisper)
β”‚       └── src/voicepad_core/
β”‚           β”œβ”€β”€ config/
β”‚           β”‚   └── settings.py        # Config model and loading
β”‚           β”œβ”€β”€ recorder.py            # AudioRecorder class
β”‚           β”œβ”€β”€ transcription.py       # transcribe_audio() function
β”‚           └── diagnostics/
β”‚               β”œβ”€β”€ system.py          # System info (RAM, CPU)
β”‚               β”œβ”€β”€ gpu.py             # GPU detection and checks
β”‚               β”œβ”€β”€ models.py          # Recommendation logic
β”‚               └── recommendations.py # get_model_recommendation()
β”‚
└── docs/
    β”œβ”€β”€ packages/voicepad/             # CLI documentation
    β”œβ”€β”€ packages/voicepad-core/        # Library documentation
    └── packages/voicepad-core/gpu-acceleration.md

License

See LICENSE file for details.

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Sponsor this project

 

Contributors

Languages