Hello,
I noticed that vllm throws an error when processing audio longer than 30 seconds.
The code itself also indicates this limitation:
self.max_chunk_size = 29 # from audio encoder position embedding length equals 1500, means 29.98s audio
I see that the non-vLLM version already supports this by chunking the audio with a loop, like for i in range(0, audio.shape[0], 16000 * 25).
Is there a way for the vLLM version to support longer audio?