How to Support Longer Audio in the vLLM Implementation?

Hello,

 I noticed that vllm throws an error  when processing audio longer than 30 seconds.

The code itself also indicates this limitation:
`self.max_chunk_size = 29  # from audio encoder position embedding length equals 1500, means 29.98s audio`

I see that the non-vLLM version already supports this by chunking the audio with a loop, like `for i in range(0, audio.shape[0], 16000 * 25)`.

Is there a way for the vLLM version to support longer audio?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Support Longer Audio in the vLLM Implementation? #69

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to Support Longer Audio in the vLLM Implementation? #69

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions