-
Notifications
You must be signed in to change notification settings - Fork 386
[Bugfix]Support Qwen3-MOE on aclgraph mode in sizes capture and add new ut #2352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue with sizes capture for the Qwen3-MOE model in aclgraph mode and adds new unit tests to cover this scenario. The changes in vllm_ascend/utils.py
adjust the calculation for maximum batch sizes based on the HCCL_OP_EXPANSION_MODE
environment variable. My review focuses on improving the new tests for better isolation and enhancing the readability and maintainability of the calculation logic by addressing magic numbers and style issues. The proposed changes will make the tests more robust and the code easier to understand.
4414d64
to
a84e9a6
Compare
vllm_ascend/utils.py
Outdated
(num_hidden_layers + 1) / parallel_factor) | ||
logger.info("Calculated maximum supported batch sizes for ACL graph: %s", | ||
max_num_batch_sizes) | ||
if envs.HCCL_OP_EXPANSION_MODE == 'AIV': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rebase to main, it's envs_ascend
https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/utils.py#L34
I think this also fix #2229 |
Signed-off-by: lilinsiman <[email protected]>
@@ -55,6 +55,9 @@ | |||
# Please make sure that the version is correct. | |||
"SOC_VERSION": | |||
lambda: os.getenv("SOC_VERSION", "ASCEND910B1"), | |||
# location for orchestrated deployment of communication algorithms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a env from HCCL, we should not add it in vllm-ascend. we can set it in docker file and mention it in doc.
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (62.50%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2352 +/- ##
==========================================
- Coverage 78.33% 78.32% -0.02%
==========================================
Files 132 132
Lines 17778 17783 +5
==========================================
+ Hits 13926 13928 +2
- Misses 3852 3855 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
[Bugfix]Support Qwen3-MOE on aclgraph mode in sizes capture and add new ut
What this PR does / why we need it?
This PR solves the problem of sizes capture and stream error caused by using ACLgraph on the Qwen3-30B MOE model.
Add new ut.
Does this PR introduce any user-facing change?
no
How was this patch tested?
ut