fix uncompatible between fc1 and non-sp-padding (vllm-project#7643)

Wangbei25 · Wangbei25 · web-flow · commit dd55736ee4c4 · 2026-03-25T23:23:37.000+08:00
cherry pick vllm-project#7614 ### What this PR does / why we need it? fix uncompatible between fc1 and non-sp-padding After PR [non-sp-padding](vllm-project#7297), kimi2.5 open flashcomm1 will raise an error : The expanded size of the tensor do not match the existing size at non-singleton dimension 0. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.18.0 - vLLM-Ascend main: 9976e68 Signed-off-by: Wangbei25 <wangbei41@huawie.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com>
diff --git a/vllm_ascend/worker/model_runner_v1.py b/vllm_ascend/worker/model_runner_v1.py
@@ -1976,7 +1976,7 @@ def dispatch_cudagraph(num_tokens, disable_full=False, valid_modes=None):
             _, num_tokens_across_dp, synced_cudagraph_mode = self._sync_batch_across_dp(
                 num_tokens_padded=num_tokens_padded,
                 cudagraph_mode=cudagraph_mode.value,
-                allow_dp_padding=cudagraph_mode != CUDAGraphMode.NONE,
+                allow_dp_padding=(cudagraph_mode != CUDAGraphMode.NONE) or enable_sp(self.vllm_config),
             )
 
             # Extract DP padding if there is any

Original file line number	Diff line number	Diff line change
`@@ -1976,7 +1976,7 @@ def dispatch_cudagraph(num_tokens, disable_full=False, valid_modes=None):`
`1976`	`1976`	`_, num_tokens_across_dp, synced_cudagraph_mode = self._sync_batch_across_dp(`
`1977`	`1977`	`num_tokens_padded=num_tokens_padded,`
`1978`	`1978`	`cudagraph_mode=cudagraph_mode.value,`
`1979`		`- allow_dp_padding=cudagraph_mode != CUDAGraphMode.NONE,`
	`1979`	`+ allow_dp_padding=(cudagraph_mode != CUDAGraphMode.NONE) or enable_sp(self.vllm_config),`
`1980`	`1980`	`)`
`1981`	`1981`
`1982`	`1982`	`# Extract DP padding if there is any`