You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There was a fix committed for padding the input audio to 30s multiples: 919a713
If I understand correctly, after the fix, there is no need to ever pad the mel spectrogram and yet the pad_or_trim is still called on mel spectrograms:
I don't think this causes any bugs right now (the pad_or_trim is always a no-op) but was quite misleading when reading the code and trying to understand how the padding works.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
There was a fix committed for padding the input audio to 30s multiples:
919a713
If I understand correctly, after the fix, there is no need to ever pad the mel spectrogram and yet the
pad_or_trim
is still called on mel spectrograms:whisper/whisper/transcribe.py
Line 231 in e8622f9
I don't think this causes any bugs right now (the
pad_or_trim
is always a no-op) but was quite misleading when reading the code and trying to understand how the padding works.Beta Was this translation helpful? Give feedback.
All reactions