Evaluation of Teacher Model (600M)

LibriSpeech Train(100 hour) WER: 0.0138 CER: 0.0041

LibriSpeech Test Clean :
WER: 0.0182 CER: 0.0052

Evaluation of Model (110M): trained on LibriSpeech Train

LibriSpeech Test Clean :

WER: 0.0914 CER: 0.0389

Student only WER: 0.1275 CER: 0.0594

Teacher joint range: -52.36 to 117.31 (very wide, 170 units) Student joint range: -1.47 to 1.56 (very narrow, 3 units) After temperature scaling:

Teacher: -5.24 to 11.73 (still wide) Student: -0.15 to 0.16 (very narrow)

This creates vastly different probability distributions:

Teacher: Sharp, confident predictions (low entropy) Student: Uniform, uncertain predictions (high entropy)

python train.py --config-path=./ --config-name=fast-conformer_transducer_bpe.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
NeMo		NeMo
scripts		scripts
tokenizer		tokenizer
.gitignore		.gitignore
README.md		README.md
fast-conformer_transducer_bpe.yaml		fast-conformer_transducer_bpe.yaml
kd_loss.ipynb		kd_loss.ipynb
train.py		train.py