Model signatures and weights

General ASR

The ASR module is designed to work with Conformer family models.

We provide a Jupyter notebook for ONNX export with needed signatures here: examples/conformer_export_onnx.ipynb.

General ASR model signature: (1, T_frames, V) where:

T_frames is the number of temporal frames after Conformer downsampling
V is the token vocabulary size (V=129 for NeMo Conformer-large).

We reference an open-source version of Conformer fine-tuned on TORGO dataset of Dysarthric speech: https://huggingface.co/miosipov/conformer-ctc-DASR-en-TORGO.

Phonemic ASR

Phonemic ASR model signature: (1, T_frames, P) where:

T_frames is the number of temporal frames after Conformer downsampling
P is the phoneme vocabulary size (P=50 in our experiments).

We reference an open-source version of Conformer fine-tuned on TEDLIUM dataset for Phonemic ASR: https://huggingface.co/miosipov/conformer-ctc-phonemic-en-tedlium.

Dysfluency Detection

The dysfluency detection module expects an ONNX model with the signature (1, C, T), where:

T is the (variable) number of time frames after log-Mel spectrogram calculation
C=4 is the number of classes. Classes should be in the following order: fluent, clonic, tonic, silence.

The fluency module uses both detector-predicted silence and its on-the-fly estimation via energy.

We do not provide pre-trained weights for this module.

GPT

Our framework supports phrase completion triggered by dysfluency detection. In our experiments, we used DistilGPT2: https://huggingface.co/distilbert/distilgpt2.

We provide a Jupyter notebook with instructions on the conversion of this model into ONNX format, adding auxiliary tokens for dysfluency: <CLONIC>, <TONIC>.

We do not provide pre-trained weights for this module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model signatures and weights

General ASR

Phonemic ASR

Dysfluency Detection

GPT

FilesExpand file tree

MODELS.md

Latest commit

History

MODELS.md

File metadata and controls

Model signatures and weights

General ASR

Phonemic ASR

Dysfluency Detection

GPT