This repository contains research code for word-level error analysis, developed as part of the following publication:
Word-Level Error Analysis in Decoding Systems: From Speech Recognition to Brain-Computer Interfaces Jingya Huang, Aashish N. Patel, Sowmya Manojna Narasimha, Gal Mishne, Vikash Gilja (2025). Interspeech 2025.
This package implements word-level error metrics to provide fine-grained evaluation of sequence-to-sequence decoding models, specifically for Automatic Speech Recognition (ASR) and Brain-to-Text Brain-Computer Interfaces (BTT). Standard sentence-level metrics often does not capture nuanced error patterns at the word level, particularly for infrequent or semantically critical words.
To address this, we introduce a refined alignment algorithm that attributes edit operations to specific words. The framework supports multiple word-level metrics that quantify both literal correctness and semantic similarity between decoded and reference words. These metrics enable detailed analysis of generalization gaps associated with word frequency, which are particularly relevant for assessing model performance on out-of-vocabulary (OOV) and low-frequency words.
While our primary experiments focus on character-level decoding outputs, the framework is generalizable to other output units (e.g., phonemes or subword tokens), provided that word delimiter symbols are available.
Consider the example sentence:
"but you will still have to have an orthodontist to straighten out your teeth"
A system may correctly decode the majority of words while consistently misrecognizing infrequent but semantically important words such as "orthodontist," "straighten," and "teeth." Although sentence-level WER may remain low, the semantic fidelity of the transcription is substantially degraded. The metrics provided in this package are designed to expose such discrepancies by attributing errors at the word level and capturing both exact correctness and semantic distance.
alignment/— Core refined alignment algorithm for edit attributionmetrics/— Implementations of word-level correctness and semantic similarity metricsexamples/— Usage examples and demonstration scriptsrequirements.txt— Package dependencies
Clone the repository:
git clone https://github.com/TNEL-UCSD/word-metrics.git
cd word-metricsInstall the package and dependencies:
pip install .
pip install -r requirements.txtNote: If the repository is updated, please reinstall to ensure compatibility.
- Python >= 3.9
- NumPy
- SciPy
- scikit-learn
- datasets
- transformers
- soundfile
- librosa
- flair
- spacy
- seaborn
- SpeechBrain
- PyTorch (CUDA-compatible version)
Full dependency versions are specified in requirements.txt.
Example scripts are provided in the examples/ directory to demonstrate:
- Alignment and edit attribution
- Word-level error metric computation
- Evaluation on ASR model outputs (e.g., SpeechBrain wav2vec2 models)
The code is designed to operate on decoded text sequences where word boundaries are explicitly marked.
If you use this code in your work, please cite:
@inproceedings{huang2025word,
title={Word-Level Error Analysis in Decoding Systems: From Speech Recognition to Brain-Computer Interfaces},
author={Huang, Jingya and Patel, Aashish N. and Narasimha, Sowmya Manojna and Mishne, Gal and Gilja, Vikash},
booktitle={Interspeech},
year={2025}
}
This repository is released under the MIT License.
For questions, bug reports, or feature requests, please open an issue on this repository.
