How whisper managed previous context during training? #2476
Unanswered
joseluis-recog
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I was wondering whether Whisper's developers make use of a padding token during batch training. Specifically, I’ve been experimenting with feeding the context (the tokens situated between <|startofprev|> and <|startoftranscript|>) during training. To facilitate batching, I need to pad these contexts to match the length of the longest one in the batch.
However, I couldn’t find any documentation or references regarding the use of padding tokens during Whisper’s training process. I’ve tried various padding approaches, such as padding on the left, padding on the right, using -100, and even using token ID 50256 (which corresponds to the empty string, ""). In all cases, Whisper seems to output random, nonsensical tokens in response to the padded inputs.
This behavior leads me to suspect that instead of padding, Whisper's developers might have truncated the contexts to the shortest length in the batch during training. If that’s the case, it would explain why the model doesn’t recognize any specific padding pattern.
If anyone has insights into how this was handled during training or knows the correct approach, it would be incredibly helpful!
Beta Was this translation helpful? Give feedback.
All reactions