Open Source State-Of-the-Art Solution for Romanian Speech Recognition — SpeD 2025 🗣️

This repository contains the codebase and resources developed for the paper:

Open Source State-Of-the-Art Solution for Romanian Speech Recognition
Presented at SpeD 2025 – International Conference on Speech Technology and Human–Computer Dialogue

This project is built on top of the NVIDIA NeMo framework (version 2.3.1) and focuses on developing a Romanian Offline ASR model based on the FastConformer Hybrid TDT-CTC architecture. We further enhance decoding with an external KenLM N-gram language model for improved accuracy.

Our system achieves state-of-the-art (SOTA) performance across 7 different Romanian evaluation datasets.

🧠 Overview

🏆 Model: FastConformer Hybrid TDT-CTC
🇷🇴 Language: Romanian
📊 Benchmark: 7 public evaluation datasets
🪄 Features:
- Offline ASR for Romanian
- Trained N-gram Language model on Romanian text
- Fully reproducible evaluation setup

This repository provides:

The model training and inference pipeline (adapted from NeMo)
Evaluation scripts and annotations used for benchmarking
Ready-to-use pretrained models hosted on Hugging Face
The manifests/ folder provides the annotation manifests used to evaluate our ASR model on 7 datasets. This allows other researchers to benchmark their own systems under identical conditions, ensuring fair comparison in future studies.

🤗 Hugging Face Models

ASR Model: SpeD_ParakeetRo_110M_TDT-CTC
N-gram Language Model: SpeD-Ro_6gram-tokens-prune0135

These models can be directly integrated with the pipeline in this repository.

⚡ Installation Instructions

This project builds on top of NVIDIA NeMo.
We recommend installing it via Conda + Pip for most use cases.

1. Conda / Pip (Recommended)

# Create and activate a fresh environment
conda create --name nemo python==3.10.12
conda activate nemo

# Install NeMo Toolkit
pip install "nemo_toolkit[all]"

If you want a specific NeMo version:

git clone https://github.com/NVIDIA/NeMo
cd NeMo
git checkout @${REF:-'main'}
pip install '.[all]'

2. Install Specific Domains

This project primarily uses the ASR domain, but you may install other domains if needed.

pip install "nemo_toolkit[asr]"        # ASR domain (required)
pip install "nemo_toolkit[nlp]"        # Optional
pip install "nemo_toolkit[tts]"        # Optional
pip install "nemo_toolkit[vision]"     # Optional
pip install "nemo_toolkit[multimodal]" # Optional

Evaluation & Benchmarking

This model was evaluated on 7 Romanian datasets, covering various domains and accents. The manifests/ folder includes the exact annotation files used for these benchmarks, ensuring reproducible research.

E.g.: to evaluate your own model with an N-gram model on the SSC-eval1 dataset:

cd examples/asr

python3 speech_to_text_eval.py \
dataset_manifest=../../manifests/SSC-eval1_manifest.json \
model_path=... \
output_filename=... \
decoder_type=ctc \
ctc_decoding.strategy=beam \
ctc_decoding.beam.kenlm_path=... \
ctc_decoding.beam.beam_alpha=... \
ctc_decoding.beam.beam_beta=... \
ctc_decoding.beam.beam_size=...

Results

Architecture	Decoding	RSC-eval	SSC-eval1	SSC-eval2	CDEP-eval	CV-21	Fleurs-RO	USPDATRO	RTFx
Parakeet Ro 110M TDT (ours)	Greedy	2.16	9.08	10.85	4.20	3.57	10.61	24.08	126.15
	ALSD	2.05	8.64	10.88	4.17	3.38	10.16	24.30	66.63
Parakeet Ro 110M CTC (ours)	Greedy	2.57	10.10	12.65	4.80	4.20	11.85	27.80	130.55
	Beam Token N-gram	1.73	8.12	10.75	3.92	3.29	8.85	23.40	109.46

Citation

If you use this repository, the pretrained models, or the provided manifests in your work, please cite:

@misc{pirlogeanu2025opensourcestateoftheartsolution,
      title={Open Source State-Of-the-Art Solution for Romanian Speech Recognition}, 
      author={Gabriel Pirlogeanu and Alexandru-Lucian Georgescu and Horia Cucu},
      year={2025},
      eprint={2511.03361},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2511.03361}, 
}

Also consider citing the original NVIDIA NeMo framework and KenLM:

@article{kuchaiev2019nemo,
  title={NeMo: a toolkit for building AI applications using Neural Modules},
  author={Kuchaiev, Oleksii and Ginsburg, Boris and others},
  journal={arXiv preprint arXiv:1909.09577},
  year={2019}
}

@inproceedings{heafield-2011-kenlm,
    title = "{K}en{LM}: Faster and Smaller Language Model Queries",
    author = "Heafield, Kenneth",
    editor = "Callison-Burch, Chris  and
      Koehn, Philipp  and
      Monz, Christof  and
      Zaidan, Omar F.",
    booktitle = "Proceedings of the Sixth Workshop on Statistical Machine Translation",
    month = jul,
    year = "2011",
    address = "Edinburgh, Scotland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W11-2123/",
    pages = "187--197"
}

License

Portions of this code are derived from NVIDIA NeMo under the Apache License 2.0.
Evaluation manifests are released for research use.

Contact

For questions or collaborations: gabriel.pirlogeanu@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docker		docker
examples		examples
manifests_eval		manifests_eval
nemo		nemo
requirements		requirements
scripts		scripts
tests		tests
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
nemo_dependencies.py		nemo_dependencies.py
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Source State-Of-the-Art Solution for Romanian Speech Recognition — SpeD 2025 🗣️

🧠 Overview

🤗 Hugging Face Models

⚡ Installation Instructions

1. Conda / Pip (Recommended)

2. Install Specific Domains

Evaluation & Benchmarking

Results

Citation

License

Contact

About

Uh oh!

Releases

Packages

Languages

gabitza-tech/SpeD-RoASR

Folders and files

Latest commit

History

Repository files navigation

Open Source State-Of-the-Art Solution for Romanian Speech Recognition — SpeD 2025 🗣️

🧠 Overview

🤗 Hugging Face Models

⚡ Installation Instructions

1. Conda / Pip (Recommended)

2. Install Specific Domains

Evaluation & Benchmarking

Results

Citation

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages