Skip to content

Commit 2bb7e55

Browse files
[Bugfix][PD]fix non-working disaggregated prefill (#2374)
### What this PR does / why we need it? Mainline vLLM fixes its disaggregated prefill in vllm-project/vllm#22598 . But it is still not working in vllm-ascend. To be concrete, decoder instances crash before vllm's fix and hang after vllm's fix in ascend devices. This patch allows disaggregated prefill to work. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Qwen3-0.6B 1P1D tp=1 dp=1 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0fe8508 --------- Signed-off-by: CaveNightingale <[email protected]>
1 parent 1b40665 commit 2bb7e55

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

vllm_ascend/worker/model_runner_v1.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1636,7 +1636,7 @@ def execute_model(
16361636
finished_recving) = (self._process_reqs(scheduler_output,
16371637
intermediate_tensors))
16381638
kv_connector_output = None
1639-
if finished_sending is not None and finished_recving is not None:
1639+
if finished_sending is not None or finished_recving is not None:
16401640
kv_connector_output = KVConnectorOutput(
16411641
finished_sending=finished_sending,
16421642
finished_recving=finished_recving)
@@ -1838,8 +1838,9 @@ def kv_connector_no_forward(
18381838
return EMPTY_MODEL_RUNNER_OUTPUT
18391839

18401840
output = copy.copy(EMPTY_MODEL_RUNNER_OUTPUT)
1841-
output.finished_sending = finished_sending
1842-
output.finished_recving = finished_recving
1841+
output.kv_connector_output = KVConnectorOutput(
1842+
finished_sending=finished_sending,
1843+
finished_recving=finished_recving)
18431844
return output
18441845

18451846
@staticmethod

0 commit comments

Comments
 (0)