Skip to content

Commit 89da8d9

Browse files
sighingnowsimon-mo
authored andcommitted
[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) (#24667)
Signed-off-by: Tao He <[email protected]>
1 parent 01085b1 commit 89da8d9

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

vllm/v1/attention/backends/gdn_attn.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,8 @@ def build( # type: ignore[override]
209209

210210
# prepare tensors for cudagraph
211211
if (self.use_full_cuda_graph and num_prefills == 0 and num_decodes == 0
212-
and num_spec_decodes <= self.decode_cudagraph_max_bs):
212+
and num_spec_decodes <= self.decode_cudagraph_max_bs
213+
and m.num_actual_tokens <= self.decode_cudagraph_max_bs):
213214
num_total_tokens = self.vllm_config.pad_for_cudagraph(
214215
m.num_actual_tokens)
215216
batch_size = num_total_tokens // (self.num_spec + 1)

0 commit comments

Comments
 (0)