You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the authors' great work.
I have a small question. I've been examining the Whisper Large v3 model, and it seems that the weights exceed the accuracy range of FP16.
Additionally, when roughly checking the scale of the final logits, it appears to be of a scale that cannot be computed with FP16 weights and losses.
While the paper briefly mentions training with FP16 and the source code does not load the model as FP16, there seems to be ambiguity.
Some aspects suggest loading the model as FP16, which might lead to unintended operations.
It would be helpful to clearly understand whether Whisper was trained with Mixed Precision or purely with FP16.
This clarification could assist future research.
Thank you.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you for the authors' great work.
I have a small question. I've been examining the Whisper Large v3 model, and it seems that the weights exceed the accuracy range of FP16.
Additionally, when roughly checking the scale of the final logits, it appears to be of a scale that cannot be computed with FP16 weights and losses.
While the paper briefly mentions training with FP16 and the source code does not load the model as FP16, there seems to be ambiguity.
Some aspects suggest loading the model as FP16, which might lead to unintended operations.
It would be helpful to clearly understand whether Whisper was trained with Mixed Precision or purely with FP16.
This clarification could assist future research.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions