Replies: 1 comment
-
You'll have to retrain the decoder with a larger maximum output length or use shorter audio files, btw the maximum is 448 tokens including the prompt |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey for my use case, I have modified the vocabulary so that space can be considered as a separate token and I have a new model finetuned. And since my new tokenizer considers space as a separate token, more spaces are being predicted and hence the maximum prediction budget of 224 tokens is being exhausted without completely transcribing the audio and hence the output of the audio at the end is getting truncated. Is there something we can do to solve this?
@jongwook
Beta Was this translation helpful? Give feedback.
All reactions