ASR Benchmark Tools

A comprehensive benchmarking suite for real-time Automatic Speech Recognition (ASR) models, optimized for edge devices (Raspberry Pi 5).

Features

Multi-Model Support:
- vosk: Lightweight, offline, fast.
- faster-whisper: Optimized Whisper implementation (CTranslate2).
- whisper.cpp: High-performance C++ implementation (via pywhispercpp).
Modes:
- Real-time Simulation: Streams audio chunks from file to simulate varying network latency.
- Live Audio: Real-time transcription using microphone input (requires PortAudio).
- Automated Suite: Sequential validation across multiple models and languages.
Advanced Metrics:
- RTF (Real-Time Factor): Processing speed ratio.
- Latency: Average and P90 delay per chunk.
- TTFT (Time To First Token): Responsiveness metric.
- WER/CER: Word/Character Error Rates.
- SemSim (Semantic Similarity): Meaning preservation score (0-1) using sentence-transformers.
- Resource Monitoring: Peak Memory (MB) and CPU Usage (%).
VAD Support: Integration with Voice Activity Detection for optimized processing.

Prerequisites

Python 3.10+
uv (recommended)
portaudio19-dev (for live audio, e.g., sudo apt install portaudio19-dev on Debian/Ubuntu)

Installation

git clone https://github.com/rizalbuilds/asr-benchmark.git
cd asr-benchmark
uv sync  # Install dependencies
uv run src/download_models.py # Pre-download all models

Model Storage Locations

Faster-Whisper / Semantic: ~/.cache/huggingface/hub/
Vosk: ~/.cache/vosk/
Whisper.cpp: ~/.local/share/pywhispercpp/models/

Usage

1. Automated Benchmark Suite

Run benchmarks on all models against files in audio_files/.

# Organize audio: audio_files/en/*.wav, audio_files/id/*.wav
uv run src/benchmark_suite.py

Results are saved to benchmark_results.json.

2. Manual Benchmark (File Simulation)

uv run src/main.py --runner faster-whisper --model tiny.en --audio test.wav

3. Live Audio Benchmark

uv run src/main.py --runner faster-whisper --model tiny.en --live --vad

CLI Arguments (`src/main.py`)

Argument	Description	Default
`--runner`	`vosk`, `faster-whisper`, `whisper-cpp`	Required
`--model`	Model name/path	Required
`--audio`	Path to audio file (ignored if `--live`)	Required (unless live)
`--live`	Use microphone input	`False`
`--vad`	Enable Voice Activity Detection	`False`
`--chunk-ms`	Chunk duration in ms	`1000`
`--reference`	Path to ground truth text file	`None`

Raspberry Pi 5 Optimization

Quantization: Use --quantization int8 (default).
Threads: whisper.cpp and faster-whisper defaults are tuned for 4 threads.
VAD: Enable --vad to save compute on silent chunks.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
audio_files/en		audio_files/en
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASR Benchmark Tools

Features

Prerequisites

Installation

Model Storage Locations

Usage

1. Automated Benchmark Suite

2. Manual Benchmark (File Simulation)

3. Live Audio Benchmark

CLI Arguments (`src/main.py`)

Raspberry Pi 5 Optimization

License

About

Uh oh!

Releases

Packages

Languages

rizaleow/asr-benchmark

Folders and files

Latest commit

History

Repository files navigation

ASR Benchmark Tools

Features

Prerequisites

Installation

Model Storage Locations

Usage

1. Automated Benchmark Suite

2. Manual Benchmark (File Simulation)

3. Live Audio Benchmark

CLI Arguments (src/main.py)

Raspberry Pi 5 Optimization

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

CLI Arguments (`src/main.py`)

Packages