Voice recording and GPU-accelerated transcription tool. Effortlessly capture and transcribe audio with intelligent defaults and GPU support.
- ποΈ Simple Recording: Record audio with one command, press Ctrl+C to stop
- π GPU-Accelerated Transcription: Fast transcription using faster-whisper with optional CUDA support
- βοΈ Smart Configuration: Set defaults once, use them everywhere
- π Automatic Output: Saves both audio and markdown transcriptions to configured directories
- π― Auto Model Selection: System diagnostics recommend optimal model for your hardware
- π§ System Inspection: Check GPU availability, supported models, and system capabilities
This is a two-package architecture:
voicepad- Command-line interface for recording and configuration (in packages/voicepad/)voicepad-core- Python library with recording, transcription, and diagnostics (in packages/voicepad-core/)
Run VoicePad instantly without manual installation:
# List audio input devices and set a default
uvx voicepad config input
# Check system capabilities and get model recommendations
uvx voicepad config system
# Start recording (output directory created automatically)
uvx voicepad record startFor development or local use:
git clone https://github.com/HYP3R00T/voicepad.git
cd voicepad
uv sync
uv run voicepad config input- Python: 3.13+
- Audio Device: Microphone or audio input device
- GPU (Optional): NVIDIA GPU for 4-5x faster transcription
# Start recording (press Ctrl+C to stop)
voicepad record start
# Display current recording configuration
voicepad record info
# Record with auto-transcription disabled
voicepad record start --no-transcribe
# Record for a fixed duration (seconds)
voicepad record start --duration 30
# Use a custom filename prefix
voicepad record start --prefix my_recording# List available audio input devices
voicepad config input
# Edit voicepad.yaml to set a default input device:
# input_device_index: 2# Display RAM, CPU, and GPU information
voicepad config system
# Get model recommendations based on your system
voicepad config recommend
# View current transcription configuration
voicepad config transcription
# List all available Whisper models
voicepad config modelsConfiguration is managed through voicepad.yaml in the working directory or ~/.config/voicepad/voicepad.yaml globally.
# Paths for saving recordings and transcriptions
recordings_path: data/recordings
markdown_path: data/markdown
# Audio device (null for default system audio input)
input_device_index: null
# Filename prefix for recordings
recording_prefix: recording
# Transcription settings
transcription_model: tiny # See available models below
transcription_device: auto # auto, cuda, or cpu
transcription_compute_type: auto # auto, float16, int8, or float32- CLI command arguments
- Environment variables (e.g.,
VOICEPAD_TRANSCRIPTION_MODEL=small) - Project config (
./voicepad.yaml) - Global config (
~/.config/voicepad/voicepad.yaml) - Built-in defaults
Install GPU support for 4-5x faster transcription:
# Install with GPU support
pip install voicepad-core[gpu]
# Verify GPU is available
voicepad config systemSee the GPU Acceleration Guide for detailed setup instructions.
Use voicepad config models to list available models. Smaller models are faster but less accurate; larger models are slower but more accurate.
| Model | Size | Speed (CPU) | Accuracy | VRAM (GPU) | Language |
|---|---|---|---|---|---|
| tiny | 39M | β‘β‘β‘β‘β‘ Very Fast | β Low | <1 GB | Multi |
| base | 74M | β‘β‘β‘β‘ Fast | ββ | <1 GB | Multi |
| small | 244M | β‘β‘β‘ Moderate | βββ | 1-2 GB | Multi |
| medium | 769M | β‘β‘ Slow | ββββ | 2-3 GB | Multi |
| large-v2 | 1.5B | β‘ Very Slow | βββββ Excellent | ~4.7 GB | Multi |
| large-v3 | 1.5B | β‘ Very Slow | βββββ Excellent | ~4.7 GB | Multi |
| turbo | 809M | β‘β‘β‘ Moderate | βββββ | 3-4 GB | Multi |
| distil-small.en | 134M | β‘β‘β‘ Moderate | βββ | <1 GB | English Only |
| distil-medium.en | 394M | β‘β‘ Slow | ββββ | 1-2 GB | English Only |
| distil-large-v2 | 756M | β‘β‘ Slow | βββββ | 3-4 GB | English Only |
Tip: Run voicepad config recommend to get a model recommendation based on your system resources.
- voicepad - CLI commands and usage guide
- voicepad-core - Python library API reference
- GPU Acceleration - GPU setup and optimization
Follow the project coding standards:
- Naming:
snake_casefor functions/variables,PascalCasefor classes - Type Hints: Required for all functions and class attributes
- Formatting: PEP 8
- Validation: Use Pydantic for data models
See .github/copilot-instructions.md for complete guidelines.
# Format all code
ruff format
# Check for linting issues
ruff check
# Type check Python code
ty checkvoicepad (root)
βββ packages/
β βββ voicepad/ # CLI package (Typer)
β β βββ src/voicepad/
β β βββ cli/
β β β βββ record.py # record start, record info
β β β βββ config.py # config input, system, recommend, etc.
β β βββ __main__.py # CLI entry point
β β
β βββ voicepad-core/ # Core library (Pydantic + faster-whisper)
β βββ src/voicepad_core/
β βββ config/
β β βββ settings.py # Config model and loading
β βββ recorder.py # AudioRecorder class
β βββ transcription.py # transcribe_audio() function
β βββ diagnostics/
β βββ system.py # System info (RAM, CPU)
β βββ gpu.py # GPU detection and checks
β βββ models.py # Recommendation logic
β βββ recommendations.py # get_model_recommendation()
β
βββ docs/
βββ packages/voicepad/ # CLI documentation
βββ packages/voicepad-core/ # Library documentation
βββ packages/voicepad-core/gpu-acceleration.md
See LICENSE file for details.