Skip to content

rizaleow/asr-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASR Benchmark Tools

A comprehensive benchmarking suite for real-time Automatic Speech Recognition (ASR) models, optimized for edge devices (Raspberry Pi 5).

Features

  • Multi-Model Support:
    • vosk: Lightweight, offline, fast.
    • faster-whisper: Optimized Whisper implementation (CTranslate2).
    • whisper.cpp: High-performance C++ implementation (via pywhispercpp).
  • Modes:
    • Real-time Simulation: Streams audio chunks from file to simulate varying network latency.
    • Live Audio: Real-time transcription using microphone input (requires PortAudio).
    • Automated Suite: Sequential validation across multiple models and languages.
  • Advanced Metrics:
    • RTF (Real-Time Factor): Processing speed ratio.
    • Latency: Average and P90 delay per chunk.
    • TTFT (Time To First Token): Responsiveness metric.
    • WER/CER: Word/Character Error Rates.
    • SemSim (Semantic Similarity): Meaning preservation score (0-1) using sentence-transformers.
    • Resource Monitoring: Peak Memory (MB) and CPU Usage (%).
  • VAD Support: Integration with Voice Activity Detection for optimized processing.

Prerequisites

  • Python 3.10+
  • uv (recommended)
  • portaudio19-dev (for live audio, e.g., sudo apt install portaudio19-dev on Debian/Ubuntu)

Installation

git clone https://github.com/rizalbuilds/asr-benchmark.git
cd asr-benchmark
uv sync  # Install dependencies
uv run src/download_models.py # Pre-download all models

Model Storage Locations

  • Faster-Whisper / Semantic: ~/.cache/huggingface/hub/
  • Vosk: ~/.cache/vosk/
  • Whisper.cpp: ~/.local/share/pywhispercpp/models/

Usage

1. Automated Benchmark Suite

Run benchmarks on all models against files in audio_files/.

# Organize audio: audio_files/en/*.wav, audio_files/id/*.wav
uv run src/benchmark_suite.py

Results are saved to benchmark_results.json.

2. Manual Benchmark (File Simulation)

uv run src/main.py --runner faster-whisper --model tiny.en --audio test.wav

3. Live Audio Benchmark

uv run src/main.py --runner faster-whisper --model tiny.en --live --vad

CLI Arguments (src/main.py)

Argument Description Default
--runner vosk, faster-whisper, whisper-cpp Required
--model Model name/path Required
--audio Path to audio file (ignored if --live) Required (unless live)
--live Use microphone input False
--vad Enable Voice Activity Detection False
--chunk-ms Chunk duration in ms 1000
--reference Path to ground truth text file None

Raspberry Pi 5 Optimization

  • Quantization: Use --quantization int8 (default).
  • Threads: whisper.cpp and faster-whisper defaults are tuned for 4 threads.
  • VAD: Enable --vad to save compute on silent chunks.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages