Skip to content

Commit fbb0c3b

Browse files
LucasWilkinsonYuqi Zhang
authored andcommitted
[BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) (vllm-project#17283)
Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
1 parent 62f6e50 commit fbb0c3b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/attention/backends/flash_attn.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -372,7 +372,7 @@ def schedule(batch_size, cu_query_lens, max_query_len, seqlens,
372372
suffix_kv_lens = torch.from_numpy(suffix_kv_lens).to(
373373
self.runner.device)
374374
prefix_scheduler_metadata = schedule(
375-
batch_size=num_reqs,
375+
batch_size=1,
376376
cu_query_lens=cu_prefix_query_lens,
377377
max_query_len=num_actual_tokens,
378378
seqlens=prefix_kv_lens,

0 commit comments

Comments
 (0)