How to increase the decoder length size above 448? #2312

HadiSDev · 2024-08-24T16:32:17Z

HadiSDev
Aug 24, 2024

Hi. I am dealing with long sequences of audio that ranges between 400 and 700 lengths in the decoder ids. Whisper by default only supports 448 but is there any way to change that?

ryanheise · 2024-08-25T15:17:05Z

ryanheise
Aug 25, 2024

Not really, that is built into the model. The 448 tokens is half for the prompt and half for the output. The input window is also fixed at 30 seconds of audio, so if you're talking about long sequences of audio, as opposed to long sequences of tokens, then that is itself also limited to 30 seconds.

How to deal with these limits:

If your 30 seconds of audio contains extremely rapid speech, enough to generate more than 224 tokens of output, (or you're dealing with a language that uses a higher concentration of tokens for the same duration), then try cutting up your audio into smaller pieces such that they fit within the token limit.

On the other hand, if you're trying to squeeze more than 224 tokens into the prompt, you should crop that to 224.

If you want to deal with audio longer than 30 seconds, you need to split it up into 30 second segments and stitch the results together (which is what the Whisper code does).

0 replies

forfrt · 2025-04-16T03:13:20Z

forfrt
Apr 16, 2025

for the whisper model from huggingface, you can set the model.config.max_target_position=512 or any num

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to increase the decoder length size above 448? #2312

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to increase the decoder length size above 448? #2312

Uh oh!

HadiSDev Aug 24, 2024

Replies: 2 comments

Uh oh!

Uh oh!

ryanheise Aug 25, 2024

Uh oh!

forfrt Apr 16, 2025

HadiSDev
Aug 24, 2024

ryanheise
Aug 25, 2024

forfrt
Apr 16, 2025