support cp&dcp #3260

LookAround0301 · 2025-09-29T04:57:07Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@releases/v0.11.0

Signed-off-by: LookAround <[email protected]>

Signed-off-by: chenjie <[email protected]>

model runner support cp: input ids, position ids and slot mapping

Signed-off-by: chenjie <[email protected]>

Signed-off-by: LookAround <[email protected]>

model runner support cp: metadata, logits indices

Signed-off-by: LookAround <[email protected]>

github-actions · 2025-09-29T04:57:15Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces support for context parallelism (CP) and decode context parallelism (DCP) for Ascend NPUs, which is a significant feature addition. The changes are extensive, touching attention mechanisms, worker logic, and distributed state management. While the core implementation for CP/DCP seems thorough, I've identified several critical issues. These include a potential performance regression due to the removal of a workaround for tensor.tolist(), bugs in the new example script that lead to incorrect performance measurements, and the removal of important configuration logic for non-MLA models that could cause issues. Additionally, there are opportunities for performance improvements in newly added helper functions and some leftover debugging code that should be removed.

examples/offline_inference_npu_long_seq.py

gemini-code-assist · 2025-09-29T04:59:43Z

vllm_ascend/worker/model_runner_v1.py

                if max_gen_len == 1:
                    # No spec decode tokens.
-                    valid_sampled_token_ids = self._to_list(sampled_token_ids)
+                    valid_sampled_token_ids = sampled_token_ids.tolist()


The custom _to_list method, which was a workaround for a performance issue with tensor.tolist() causing an implicit device-wide synchronization, has been removed. The call site now uses sampled_token_ids.tolist() directly. This likely reintroduces the performance problem that the workaround was meant to solve. Unless the underlying issue in torch_npu has been resolved, the original workaround should be restored to avoid a performance regression.

gemini-code-assist · 2025-09-29T04:59:43Z

vllm_ascend/worker/model_runner_v1.py

    def _build_drafter_prepare_inputs_torchair_param(self):
-        return False
-
-    def _to_list(self, sampled_token_ids: torch.Tensor) -> list[list[int]]:
-        # This is a short term mitigation for issue mentioned in
-        # https://github.com/vllm-project/vllm/issues/22754.
-        # `tolist` would trigger a npu wise stream sync, which
-        # would block other copy ops from other npu streams.
-        # A npu event sync would avoid such a situation. Since
-        # this is in the critical path of every single model
-        # forward loop, this has caused perf issue for a disagg
-        # setup.
-        pinned = self.sampled_token_ids_pinned_cpu[:sampled_token_ids.shape[0]]
-        pinned.copy_(sampled_token_ids, non_blocking=True)
-        self.transfer_event.record()
-        self.transfer_event.synchronize()
-        return pinned.tolist()
+        return False


The _to_list method, which contained a performance-critical workaround for tensor.tolist(), has been removed. This is likely to cause a performance regression. Please restore this method and its usage at call sites.

vllm_ascend/platform.py

vllm_ascend/worker/model_runner_v1.py

gemini-code-assist · 2025-09-29T04:59:43Z

vllm_ascend/worker/worker_v1.py

-        ensure_model_parallel_initialized(
-            self.parallel_config.tensor_parallel_size,
-            self.parallel_config.pipeline_parallel_size)
+        print(f"context_parallel_enable:{context_parallel_enable}")


A print statement has been left in the code. This will produce verbose and unnecessary output in the logs, which is undesirable in production environments and can make debugging more difficult.

Signed-off-by: LookAround <[email protected]>

github-actions · 2025-09-29T06:01:07Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

weijinqian0 · 2025-09-29T08:09:53Z

vllm_ascend/worker/model_runner_v1.py

        tokens = [scheduler_output.num_scheduled_tokens[i] for i in req_ids]
+        original_num_scheduled_tokens = np.array(tokens, dtype=np.int32)
+        original_total_num_scheduled_tokens = total_num_scheduled_tokens
+        tokens = self._update_tokens_for_cp(tokens, scheduler_output)


will this modification lead to a performance degradation when the CP is not enabled,?

Signed-off-by: Delphine-Nic <[email protected]>

LookAround0301 and others added 12 commits September 24, 2025 22:16

[mla backend] support dcp&cp prefill

f5862ac

Signed-off-by: LookAround <[email protected]>

model runner support cp: input ids, position ids and slot mapping

d1ad588

Signed-off-by: chenjie <[email protected]>

Merge pull request #28 from HiC4Sh1e/long_seq_dev

c0e0f51

model runner support cp: input ids, position ids and slot mapping

model runner support cp: metadata, logits indices

b301659

Signed-off-by: chenjie <[email protected]>

[mla backend] add num_computed_tokens_of_dcp_sp

2f36197

Signed-off-by: LookAround <[email protected]>

Merge pull request #29 from HiC4Sh1e/long_seq_dev

30e8076

model runner support cp: metadata, logits indices

[bug] fix config & block_table bug

f887deb

Signed-off-by: LookAround <[email protected]>

[optim] support not enable cp and add env

1bc86bc

Signed-off-by: LookAround <[email protected]>

[bug] fix prefill bug

b69f45a

Signed-off-by: LookAround <[email protected]>

[bug] fix decode bug (single batch)

8b333b9

Signed-off-by: LookAround <[email protected]>

[bug] fix dcp bug

2470894

Signed-off-by: LookAround <[email protected]>

[bug] fix block size bug

8442fb8

Signed-off-by: LookAround <[email protected]>

github-actions bot added the module:core label Sep 29, 2025

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

[optim] clean code

9022138

Signed-off-by: LookAround <[email protected]>

github-actions bot added the merge-conflicts label Sep 29, 2025

weijinqian0 reviewed Sep 29, 2025

View reviewed changes

GQA support pcp and dcp

8dda1ba

Signed-off-by: Delphine-Nic <[email protected]>

github-actions bot added the module:ops label Sep 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support cp&dcp #3260

support cp&dcp #3260

LookAround0301 commented Sep 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Sep 29, 2025

Uh oh!

gemini-code-assist bot Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Sep 29, 2025

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

weijinqian0 Sep 29, 2025

Uh oh!

Uh oh!

support cp&dcp #3260

Are you sure you want to change the base?

support cp&dcp #3260

Conversation

LookAround0301 commented Sep 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

weijinqian0 Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LookAround0301 commented Sep 29, 2025 •

edited by github-actions bot

Loading