Skip to content

Torchair graph812 cov #2337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

NicholasTao
Copy link

@NicholasTao NicholasTao commented Aug 12, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

p00465316 and others added 11 commits August 8, 2025 17:39
Signed-off-by: p00465316 <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Qwen models with torchair, including new model files, tests, and configuration changes. My review focuses on several critical issues. I've identified a typo in a test patch target that would cause it to fail. More importantly, I found two critical bugs in the rotary_embedding implementation: one causing a tensor shape mismatch due to incorrect unsqueeze operations, and another related to an incorrect monkey-patch target that would prevent the new rotary embedding logic from being used. These issues need to be addressed to ensure the correctness of the new functionality.

def stubbed_get_state(ep_size, with_prefill, is_deepseek_v3_r1):
return _get_fused_moe_state(16, with_prefill, is_deepseek_v3_r1)

with patch('mockif._get_fused_moe_state', side_effect=stubbed_get_state):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The target string for patch seems incorrect. 'mockif._get_fused_moe_state' will likely cause a ModuleNotFoundError because mockif is not a valid module. Based on the imports, you probably intended to patch _get_fused_moe_state from vllm_ascend.ascend_forward_context.

Suggested change
with patch('mockif._get_fused_moe_state', side_effect=stubbed_get_state):
with patch('vllm_ascend.ascend_forward_context._get_fused_moe_state', side_effect=stubbed_get_state):

Comment on lines +334 to +340
cos = cos.unsqueeze(-2).unsqueeze(-2)
sin = sin.unsqueeze(-2).unsqueeze(-2)

query = query.unsqueeze(1)
key = key.unsqueeze(1)

q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The unsqueeze(1) calls on query and key introduce an extra dimension, which leads to an incorrect output shape from rope_forward. The attention mechanism expects a tensor of shape (num_tokens, num_heads * head_size), but this implementation will produce a tensor with an extra dimension of size 1, e.g., (num_tokens, 1, num_heads * head_size). This will cause a shape mismatch error in the subsequent attention computation. The unsqueeze calls on cos and sin should also be adjusted to a single unsqueeze to ensure correct broadcasting.

Suggested change
cos = cos.unsqueeze(-2).unsqueeze(-2)
sin = sin.unsqueeze(-2).unsqueeze(-2)
query = query.unsqueeze(1)
key = key.unsqueeze(1)
q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin)
cos = cos.unsqueeze(-2)
sin = sin.unsqueeze(-2)
q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin)

@@ -283,7 +378,8 @@ def deepseek_rope_init_func(
device="npu")


RotaryEmbedding.forward_oot = rope_forward_oot
RotaryEmbedding.__init__ = qwen_rope_init_func
RotaryEmbedding.forward_oot = rope_forward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

You are monkey-patching RotaryEmbedding.forward_oot, but RotaryEmbedding instances are called via their forward method. To correctly override the rotary embedding logic, you should patch RotaryEmbedding.forward. Using forward_oot will not have the desired effect as it's not the standard method called.

Suggested change
RotaryEmbedding.forward_oot = rope_forward
RotaryEmbedding.forward = rope_forward

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@NicholasTao NicholasTao force-pushed the torchair_graph812_cov branch 2 times, most recently from d2462c9 to 0739169 Compare August 12, 2025 15:52
Copy link

codecov bot commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 77.67442% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.77%. Comparing base (c611291) to head (f759f5e).
⚠️ Report is 27 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/models/qwen3_moe.py 31.74% 43 Missing ⚠️
vllm_ascend/ops/rotary_embedding.py 89.58% 5 Missing ⚠️

❌ Your patch check has failed because the patch coverage (77.67%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2337      +/-   ##
==========================================
- Coverage   76.09%   75.77%   -0.32%     
==========================================
  Files         114      118       +4     
  Lines       13103    13731     +628     
==========================================
+ Hits         9971    10405     +434     
- Misses       3132     3326     +194     
Flag Coverage Δ
unittests 75.77% <77.67%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NicholasTao NicholasTao force-pushed the torchair_graph812_cov branch 3 times, most recently from 59a4e7c to f759f5e Compare August 13, 2025 03:10
2. add rope_forward ut

Signed-off-by: taoyuxiang <[email protected]>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant