3 questions from a noob #846

asennoussi · 2023-01-15T07:46:40Z

asennoussi
Jan 15, 2023

Hey guys, Awesome work! Thank you for that.
I'm new to ML, and I just want to understand how the inference happens in reality.
Correct me if I'm wrong, but after training the model, what happens is we're building a mapping between the wave features (Frequency, Amplitude etc..) and the words, so when I say, "High", this extracts features from the audio after doing all the encoding and prep to the audio, then tries to find in the model the closest word to these features.
We added another "weight" based on the predictions dictionary to make it better. So If I say "the mountain is high" because there is an "is" before, the model suggests "high" instead of "Hi."

If the above is true, does that mean the more training you do, the bigger your model will get? Can a small model be built even after training on a vast dataset?

My second question is about the large model. It performs better in non-English languages than small or tiny, or medium. Can I extract a model only for that specific language from the large model? So it's not as big as the large model, but better performing than medium.

Third question is, does the model size impact the response time?

Thank you again and looking forward to hearing from you.

jongwook · 2023-01-17T08:19:40Z

jongwook
Jan 17, 2023
Maintainer

You're correct that the model learns the statistical correlations between the wave features and the words, but during that process we don't add new weights to the model, but a fixed number of existing weights are adjusted to better represent the relationship between the audio and the transcript. For Whisper (except the new large-v2), we started with 5 different model sizes (i.e. the number of weights in each model) and trained with the same amount of data. Typically, larger models are more flexible and end up performing better than the smaller ones.
It could be done with knowledge distillation or fine-tuning.
Yes. Larger models need more computation to perform the transcription and takes longer.

1 reply

asennoussi Jan 17, 2023
Author

Thanks so much. Very clear.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

3 questions from a noob #846

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

3 questions from a noob #846

Uh oh!

asennoussi Jan 15, 2023

Replies: 1 comment · 1 reply

Uh oh!

jongwook Jan 17, 2023 Maintainer

Uh oh!

asennoussi Jan 17, 2023 Author

asennoussi
Jan 15, 2023

Replies: 1 comment 1 reply

jongwook
Jan 17, 2023
Maintainer

asennoussi Jan 17, 2023
Author