-
Notifications
You must be signed in to change notification settings - Fork 386
[Qwen-moe] Remove the minor operation arange #2373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the creation of row_idx
for MoE operations by centralizing the torch.arange
call within the select_experts
function. This improves code structure by removing redundant code from several expert fusion functions. The changes affect both unquantized and quantized MoE paths, and the necessary function signatures have been updated accordingly. I've found a critical syntax error in the w4a8 dynamic quantization path that needs to be fixed.
eb1bd29
to
0c6ac41
Compare
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
I mentioned this pr very early, and has been verified locally, and no error is reported. In order to resolve the conflict of the previous integration, the pr was re-mentioned. |
a23e811
to
c466592
Compare
Signed-off-by: s30076806 <[email protected]>
c466592
to
5f4adf5
Compare
0cd1812
to
7da13fc
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2373 +/- ##
==========================================
+ Coverage 77.93% 77.96% +0.02%
==========================================
Files 134 134
Lines 18504 18499 -5
==========================================
+ Hits 14422 14423 +1
+ Misses 4082 4076 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: s30076806 <[email protected]>
7da13fc
to
6088f52
Compare
What this PR does / why we need it?
Integrate the arange operator to reduce the time spent and improve performance
Does this PR introduce any user-facing change?
No
How was this patch tested?