regarding long form transcription after changing the tokenizer #2604

20IM30007 · 2025-06-01T18:44:49Z

20IM30007
Jun 1, 2025

Hey for my use case, I have modified the vocabulary so that space can be considered as a separate token and I have a new model finetuned. And since my new tokenizer considers space as a separate token, more spaces are being predicted and hence the maximum prediction budget of 224 tokens is being exhausted without completely transcribing the audio and hence the output of the audio at the end is getting truncated. Is there something we can do to solve this?

@jongwook

MahmoudAshraf97 · 2025-06-02T15:11:07Z

MahmoudAshraf97
Jun 2, 2025

You'll have to retrain the decoder with a larger maximum output length or use shorter audio files, btw the maximum is 448 tokens including the prompt

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

regarding long form transcription after changing the tokenizer #2604

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

regarding long form transcription after changing the tokenizer #2604

Uh oh!

20IM30007 Jun 1, 2025

Replies: 1 comment

Uh oh!

MahmoudAshraf97 Jun 2, 2025

20IM30007
Jun 1, 2025

MahmoudAshraf97
Jun 2, 2025