Possible issue with segment timestamps in prompt #1723
Unanswered
funboarder13920
asked this question in
Q&A
Replies: 1 comment
-
From the answer provided here : #838 (comment) , I guess the timestamps in the prompt during the inference do not correspond to the process applied during training. During inference:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am wondering if the way the prompt is built during the inference is aligned with the prompt from the training.
During the inference, segment timestamps from the decoding are propagated in the prompt. Tokens from the decoding are appended to the all_tokens history. After being appended, the timestamps do not represent much anymore, for example timestamps could be out of order which might never have been seen by the model which could be an issue. It is also possible that the prompt from the training data does not contain any timestamps as it is not really necessary.
I didn't find a anything in the code that would get rid of the prompt timestamps.
The openai whisper paper does not go into training data details.
Do you have any insights on the format of the training data, especially regarding the prompt ?
Best,
Beta Was this translation helpful? Give feedback.
All reactions