[v0.9.1] MTP supports V1 scheduler #2371

JC-ut0 · 2025-08-14T07:20:20Z

What this PR does / why we need it?

MTP supports V1 scheduler
Refactor attn metadata build

Does this PR introduce any user-facing change?

How was this patch tested?

v0.9.1-dev
A3 [TP16] [DP4 TP4]
A3 1P1D

gemini-code-assist

Code Review

This pull request refactors the attention metadata handling by introducing a centralized AscendCommonAttentionMetadata dataclass. This is a good architectural improvement that centralizes logic and reduces code duplication. However, the review identified several critical issues related to this refactoring, including incorrect tensor slicing and initialization that could lead to runtime errors or incorrect behavior. Specifically, there are bugs in build_dummy_metadata in attention_v1.py and in the build method of mla_v1.py's metadata builder. There are also some redundant code assignments that should be cleaned up.

gemini-code-assist · 2025-08-14T07:21:22Z

vllm_ascend/attention/attention_v1.py

+            block_table[:num_reqs, :self.runner.max_num_blocks_per_req] = (
+                block_table[:num_reqs])
+
+            query_start_loc = common_attn_metadata.query_start_loc


The common_attn_metadata used here is initialized without a query_start_loc, causing common_attn_metadata.query_start_loc to be None. This None value is then passed to the AscendMetadata constructor, which will lead to a runtime error as query_start_loc is a required tensor. To fix this, a dummy query_start_loc tensor should be created for the decode-only state.

Suggested change

query_start_loc = common_attn_metadata.query_start_loc

query_start_loc = torch.arange(0, num_reqs + 1, dtype=torch.int32, device=block_table.device)

gemini-code-assist · 2025-08-14T07:21:22Z

vllm_ascend/attention/mla_v1.py

+            actual_seq_lengths_q = query_start_loc[1:num_decodes+1].tolist()
+            max_seq_lens = seq_lens[:num_decodes].max().item()
+            seq_lens = seq_lens[:num_decodes]
+            input_positions = input_positions[:num_decodes]


The input_positions tensor is incorrectly sliced using num_decodes (the number of decode requests). It should be sliced with num_decode_tokens (the number of tokens in decode requests) to select the correct positions for the decode phase. This is a critical bug that will lead to incorrect behavior.

Suggested change

input_positions = input_positions[:num_decodes]

input_positions = input_positions[:num_decode_tokens]

gemini-code-assist · 2025-08-14T07:21:22Z

vllm_ascend/attention/attention_v1.py

+            self.device, non_blocking=True)
+        attn_mask = common_attn_metadata.attn_mask
+        attn_state = common_attn_metadata.attn_state
+        query_start_loc_cpu = common_attn_metadata.query_start_loc_cpu[:num_reqs + 1]


This line is redundant as query_start_loc_cpu is already assigned with the same value on line 183. Please remove this duplicate assignment to improve code clarity and avoid potential confusion.

gemini-code-assist · 2025-08-14T07:21:22Z

vllm_ascend/attention/mla_v1.py

+        input_positions = self.runner.positions_cpu[:num_tokens].to(
            device, non_blocking=True).long()


The input_positions tensor is assigned here from a CPU tensor and then immediately overwritten on line 398 with a device tensor from common_attn_metadata.positions. This first assignment is redundant and can be removed for clarity and to avoid unnecessary operations.

github-actions · 2025-08-15T09:40:49Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

weiguihua2 · 2025-08-15T10:09:13Z

vllm_ascend/attention/attention_v1.py

+    ):
+        self.vllm_config = vllm_config
+        self.model_config = vllm_config.model_config
+        self.device = device
        self.runner = runner


remove runner

github-actions · 2025-08-16T03:13:52Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-08-16T10:34:37Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: xuyexiong <[email protected]>

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

JC-ut0 changed the title ~~[v0.9.1] Refactor mla~~ [v0.9.1] Refactor attn metadata build Aug 14, 2025

JC-ut0 force-pushed the v0.9.1-dev branch from 4641ee2 to a5eb527 Compare August 15, 2025 09:39

github-actions bot added the merge-conflicts label Aug 15, 2025

JC-ut0 force-pushed the v0.9.1-dev branch from a5eb527 to f71f1af Compare August 15, 2025 09:46

github-actions bot removed the merge-conflicts label Aug 15, 2025

weiguihua2 reviewed Aug 15, 2025

View reviewed changes

JC-ut0 force-pushed the v0.9.1-dev branch 2 times, most recently from 619fff4 to 96a8bfb Compare August 16, 2025 02:11

github-actions bot added the documentation Improvements or additions to documentation label Aug 16, 2025

JC-ut0 force-pushed the v0.9.1-dev branch from adb7259 to 51b5e90 Compare August 16, 2025 03:13

github-actions bot added the merge-conflicts label Aug 16, 2025

JC-ut0 force-pushed the v0.9.1-dev branch from 51b5e90 to 8bad42e Compare August 16, 2025 03:16

github-actions bot removed the merge-conflicts label Aug 16, 2025

JC-ut0 force-pushed the v0.9.1-dev branch from 8bad42e to 8da8d9e Compare August 16, 2025 03:26

github-actions bot removed the documentation Improvements or additions to documentation label Aug 16, 2025

JC-ut0 force-pushed the v0.9.1-dev branch from 8da8d9e to fa8f6ed Compare August 16, 2025 03:30

github-actions bot added the documentation Improvements or additions to documentation label Aug 16, 2025

JC-ut0 changed the title ~~[v0.9.1] Refactor attn metadata build~~ [v0.9.1] MTP supports V1 scheduler Aug 16, 2025

github-actions bot added the merge-conflicts label Aug 16, 2025

MTP supports V1 scheduler

9866aa2

Signed-off-by: xuyexiong <[email protected]>

JC-ut0 force-pushed the v0.9.1-dev branch from fa8f6ed to 9866aa2 Compare August 16, 2025 11:05

github-actions bot removed merge-conflicts documentation Improvements or additions to documentation labels Aug 16, 2025

ganyi1996ppo merged commit fad57a6 into vllm-project:v0.9.1-dev Aug 16, 2025
17 checks passed

shen-shanshan mentioned this pull request Aug 19, 2025

[Release]: Release checklist for v0.9.1rc3 #2396

Closed

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v0.9.1] MTP supports V1 scheduler #2371

[v0.9.1] MTP supports V1 scheduler #2371

JC-ut0 commented Aug 14, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 14, 2025

Uh oh!

github-actions bot commented Aug 15, 2025

Uh oh!

weiguihua2 Aug 15, 2025

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

	query_start_loc = common_attn_metadata.query_start_loc
	query_start_loc = torch.arange(0, num_reqs + 1, dtype=torch.int32, device=block_table.device)

	input_positions = input_positions[:num_decodes]
	input_positions = input_positions[:num_decode_tokens]

		input_positions = self.runner.positions_cpu[:num_tokens].to(
		device, non_blocking=True).long()

[v0.9.1] MTP supports V1 scheduler #2371

[v0.9.1] MTP supports V1 scheduler #2371

Conversation

JC-ut0 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 15, 2025

Uh oh!

weiguihua2 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

JC-ut0 commented Aug 14, 2025 •

edited

Loading