Skip to content

Latest commit

 

History

History
43 lines (23 loc) · 1.96 KB

File metadata and controls

43 lines (23 loc) · 1.96 KB

Model signatures and weights

General ASR

The ASR module is designed to work with Conformer family models.

We provide a Jupyter notebook for ONNX export with needed signatures here: examples/conformer_export_onnx.ipynb.

General ASR model signature: (1, T_frames, V) where:

  • T_frames is the number of temporal frames after Conformer downsampling
  • V is the token vocabulary size (V=129 for NeMo Conformer-large).

We reference an open-source version of Conformer fine-tuned on TORGO dataset of Dysarthric speech: https://huggingface.co/miosipov/conformer-ctc-DASR-en-TORGO.

Phonemic ASR

Phonemic ASR model signature: (1, T_frames, P) where:

  • T_frames is the number of temporal frames after Conformer downsampling
  • P is the phoneme vocabulary size (P=50 in our experiments).

We reference an open-source version of Conformer fine-tuned on TEDLIUM dataset for Phonemic ASR: https://huggingface.co/miosipov/conformer-ctc-phonemic-en-tedlium.

Dysfluency Detection

The dysfluency detection module expects an ONNX model with the signature (1, C, T), where:

  • T is the (variable) number of time frames after log-Mel spectrogram calculation
  • C=4 is the number of classes. Classes should be in the following order: fluent, clonic, tonic, silence.

The fluency module uses both detector-predicted silence and its on-the-fly estimation via energy.

We do not provide pre-trained weights for this module.

GPT

Our framework supports phrase completion triggered by dysfluency detection. In our experiments, we used DistilGPT2: https://huggingface.co/distilbert/distilgpt2.

We provide a Jupyter notebook with instructions on the conversion of this model into ONNX format, adding auxiliary tokens for dysfluency: <CLONIC>, <TONIC>.

We do not provide pre-trained weights for this module.