refact mla forward_prefill&forward_decode #2342

SunnyLee151064 · 2025-08-12T13:28:27Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@67c1490

gemini-code-assist

Code Review

This pull request refactors the _forward_prefill and _forward_decode methods in AscendMLAImpl to simplify the logic by removing several conditional paths based on attention states. While the intent to simplify is good, the refactoring has introduced critical issues in the function signatures of both methods. Incorrect type hints and mismatched arguments will lead to runtime TypeErrors. These need to be fixed to ensure the code runs correctly.

gemini-code-assist · 2025-08-12T13:29:59Z

vllm_ascend/attention/mla_v1.py

        self,
-        q_nope: torch.Tensor,
+        ql_nope: torch.Tensor,
        q_pe: torch.Tensor,
        k_nope: torch.Tensor,
        k_pe: torch.Tensor,
-        kv_c_and_k_pe_cache: Tuple[torch.Tensor],
+        block_size: int,
        attn_metadata: AscendMLAMetadata,
        enable_multistream_mla: bool = False,
    ) -> torch.Tensor:


The signature of _forward_decode was changed to take block_size: int instead of kv_c_and_k_pe_cache. However, the call sites in the forward method are not updated and still pass kv_cache (a tuple of tensors), which will cause a TypeError. To fix this, you should revert the signature to accept kv_c_and_k_pe_cache and derive block_size inside this function, as it was done previously (block_size = kv_c_and_k_pe_cache[0].shape[1]).

Suggested change

self,

q_nope: torch.Tensor,

ql_nope: torch.Tensor,

q_pe: torch.Tensor,

k_nope: torch.Tensor,

k_pe: torch.Tensor,

kv_c_and_k_pe_cache: Tuple[torch.Tensor],

block_size: int,

attn_metadata: AscendMLAMetadata,

enable_multistream_mla: bool = False,

) -> torch.Tensor:

self,

ql_nope: torch.Tensor,

q_pe: torch.Tensor,

k_nope: torch.Tensor,

k_pe: torch.Tensor,

kv_c_and_k_pe_cache: Tuple[torch.Tensor, ...],

attn_metadata: AscendMLAMetadata,

enable_multistream_mla: bool = False,

) -> torch.Tensor:

github-actions · 2025-08-12T13:36:12Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-08-14T01:43:32Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

refact mla forward_prefill&forward_decode

d9a94a5

Signed-off-by: lwq <[email protected]>

gemini-code-assist bot reviewed Aug 12, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Aug 14, 2025

Update test_mla_v1.py

debbc77

github-actions bot added the module:tests label Aug 26, 2025

Update utils.py

c55b704

github-actions bot added the module:core label Aug 26, 2025

Update utils.py

0025a4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refact mla forward_prefill&forward_decode #2342

refact mla forward_prefill&forward_decode #2342

Uh oh!

SunnyLee151064 commented Aug 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 12, 2025

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

Uh oh!

refact mla forward_prefill&forward_decode #2342

Are you sure you want to change the base?

refact mla forward_prefill&forward_decode #2342

Uh oh!

Conversation

SunnyLee151064 commented Aug 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

Uh oh!

SunnyLee151064 commented Aug 12, 2025 •

edited by github-actions bot

Loading