This guide explains how to run benchmarks to evaluate model performance on your hardware.
pip install openvino-genai soundfile numpyOr using uv:
uv pip install openvino-genai soundfile numpyCompare Parakeet V2, V3, and Whisper on your hardware:
uv run python benchmarks/benchmark_whisper_ov.pyTest on specific languages with the FLEURS dataset:
# English only, 10 samples, NPU device
uv run python benchmarks/benchmark_fleurs.py --languages en_us --samples 10 --device NPU
# Multiple languages, 25 samples each
uv run python benchmarks/benchmark_fleurs.py --languages en_us es_419 fr_fr --samples 25 --device CPU
# All available languages
uv run python benchmarks/benchmark_fleurs.py --all-languages --samples 5 --device NPUFLEURS Options:
--languages: Specific language codes (e.g.,en_us,es_419,fr_fr)--all-languages: Test all 24 supported languages--samples: Number of audio samples per language (default: 10)--device: Target device -NPU,CPU, orGPU
For detailed accuracy testing on LibriSpeech test-clean:
# Build the benchmark
cmake --build build --config Release --target benchmark_librispeech
# Run on 25 files
build/examples/cpp/Release/benchmark_librispeech.exe --max-files 25
# Run on all files (2620 total)
build/examples/cpp/Release/benchmark_librispeech.exeMeasures processing speed relative to audio duration:
- RTFx = 1.0: Processes at real-time speed (1 min audio = 1 min processing)
- RTFx > 1.0: Faster than real-time (RTFx = 10 means 1 min audio in 6 seconds)
- RTFx < 1.0: Slower than real-time
Measures transcription accuracy:
- Lower is better
- Calculated as:
(Substitutions + Deletions + Insertions) / Total Words × 100 - Industry standard metric for ASR evaluation
Per-token confidence from the model:
- Range: 0.0 to 1.0 (higher is better)
- Useful for filtering uncertain predictions
See BENCHMARK_RESULTS.md for detailed performance data on Intel Core Ultra 7 155H.
- Source: OpenSLR
- License: CC-BY-4.0
- Language: English only
- Test-clean subset: 2,620 samples, ~5.4 hours
- Use case: High-quality English ASR evaluation
- Source: Google Research
- License: CC-BY-4.0
- Languages: 102 languages (eddy supports 24)
- Use case: Multilingual ASR evaluation
English, Spanish, Italian, French, German, Dutch, Russian, Polish, Ukrainian, Slovak, Bulgarian, Finnish, Romanian, Croatian, Czech, Swedish, Estonian, Hungarian, Lithuanian, Danish, Maltese, Slovenian, Latvian, Greek
Language Codes for FLEURS:
en_us- Englishes_419- Spanishit_it- Italianfr_fr- Frenchde_de- Germannl_nl- Dutchru_ru- Russianpl_pl- Polishuk_ua- Ukrainiansk_sk- Slovakbg_bg- Bulgarianfi_fi- Finnishro_ro- Romanianhr_hr- Croatiancs_cz- Czechsv_se- Swedishet_ee- Estonianhu_hu- Hungarianlt_lt- Lithuanianda_dk- Danishmt_mt- Maltesesl_si- Slovenianlv_lv- Latvianel_gr- Greek
from eddy import ParakeetASR
import time
# Initialize model
asr = ParakeetASR("parakeet-v3", device="NPU")
# Transcribe and measure performance
audio_file = "test.wav"
start_time = time.time()
result = asr.transcribe(audio_file)
elapsed = time.time() - start_time
print(f"Text: {result['text']}")
print(f"Time: {elapsed:.2f}s")
print(f"RTFx: {result['rtfx']:.2f}×")See docs/CPP_API.md for C++ integration examples.
- Devices: Intel Core Ultra (Meteor Lake or newer)
- Expected RTFx: 30-40× for Parakeet, 15-20× for Whisper
- Power efficiency: Best for battery-powered devices
- Expected RTFx: 5-10× for Parakeet, 0.4-0.5× for Whisper
- Works on: Any modern x86-64 CPU
- Use when: NPU not available
- Expected RTFx: Varies by GPU (integrated vs discrete)
- Note: Best results with discrete GPUs
- Verify OpenVINO 2025.x is installed
- Check device availability:
parakeet_cli.exe --list-devices - Use
--device NPUfor Intel Core Ultra processors - Ensure Release build (Debug is ~10× slower)
- Reduce batch size in benchmark scripts
- Use smaller model (V2 instead of V3, or Whisper base instead of large)
- Close other applications
LibriSpeech and FLEURS datasets auto-download on first run. If download fails:
# Manual download
wget https://www.openslr.org/resources/12/test-clean.tar.gz
tar -xzf test-clean.tar.gz
# Or use HuggingFace datasets library
pip install datasets
python -c "from datasets import load_dataset; load_dataset('google/fleurs', 'en_us')"Share your results with the community:
- Run benchmarks on your hardware
- Note your CPU/GPU model and OS
- Submit results via GitHub Issues or Discord
- Help us understand performance across different platforms
- GitHub Issues: github.com/FluidInference/eddy/issues
- Discord: discord.gg/WNsvaCtmDe