The `transcribe` functions still runs `pad_or_trim` on the mel spectrograms #1607

jpc · 2023-08-17T10:56:55Z

jpc
Aug 17, 2023

There was a fix committed for padding the input audio to 30s multiples:
919a713

If I understand correctly, after the fix, there is no need to ever pad the mel spectrogram and yet the pad_or_trim is still called on mel spectrograms:

whisper/whisper/transcribe.py

Line 231 in e8622f9

mel_segment = pad_or_trim(mel_segment, N_FRAMES).to(model.device).to(dtype)

I don't think this causes any bugs right now (the pad_or_trim is always a no-op) but was quite misleading when reading the code and trying to understand how the padding works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The `transcribe` functions still runs `pad_or_trim` on the mel spectrograms #1607

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

The transcribe functions still runs pad_or_trim on the mel spectrograms #1607

Uh oh!

Uh oh!

jpc Aug 17, 2023

Replies: 0 comments

The `transcribe` functions still runs `pad_or_trim` on the mel spectrograms #1607

jpc
Aug 17, 2023