-
Notifications
You must be signed in to change notification settings - Fork 387
[Refactor] refactor spec decode #2361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the speculative decoding logic by moving it into a dedicated spec_decode
module with separate proposer classes. This is a good architectural improvement that enhances modularity and code clarity. However, the refactoring has introduced some critical issues in the new proposer classes (EagleProposer
and MtpProposer
). Specifically, they attempt to access attributes like input_batch
and requests
directly, which are properties of the NPUModelRunner
. These should be accessed through the runner
instance. Additionally, EagleProposer
is missing the runner
instance itself. I've left detailed comments on how to fix these issues.
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
84377fe
to
2eb627d
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the speculative decoding logic into a dedicated spec_decode
module, which is a great step towards better code organization. The introduction of a Proposer
interface and moving different speculative decoding methods into their own classes cleans up the NPUModelRunner
significantly. However, the refactoring has introduced several critical issues, primarily related to incorrect attribute access and mismatched function signatures after moving code around. These issues will likely cause runtime errors and need to be addressed.
32f82ca
to
3e7ee1f
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the speculative decoding logic into a new spec_decode
module, which is a great improvement for code organization and maintainability. It introduces a Proposer
interface and moves the logic for different speculative decoding methods into their respective classes. The changes significantly clean up the NPUModelRunner
. However, I've found a critical issue where the NgramProposer
does not correctly implement the Proposer
interface, which will lead to a runtime error. Please see the detailed comment for the fix.
from vllm.v1.spec_decode.ngram_proposer import NgramProposer | ||
|
||
from vllm_ascend.spec_decode.utils import SpecDcodeType | ||
|
||
|
||
class NgramProposer(NgramProposer): | ||
|
||
def __init__(self, vllm_config, device, runner): | ||
super().__init__(vllm_config) | ||
self.name = SpecDcodeType.NGRAM | ||
self.device = device | ||
self.runner = runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NgramProposer
does not correctly implement the Proposer
interface from vllm_ascend/spec_decode/interface.py
. It's missing the load_model
method and doesn't inherit from Proposer
. This will cause a runtime AttributeError
in model_runner_v1.py
when self.drafter.load_model(self.model)
is called for the ngram
method. This is a critical issue that will cause a crash.
To fix this, NgramProposer
should inherit from Proposer
and implement the load_model
method. Since the ngram proposer doesn't need to load a model, the method can simply pass
.
from vllm.v1.spec_decode.ngram_proposer import NgramProposer as VllmNgramProposer
from vllm_ascend.spec_decode.interface import Proposer
from vllm_ascend.spec_decode.utils import SpecDcodeType
class NgramProposer(VllmNgramProposer, Proposer):
def __init__(self, vllm_config, device, runner):
super().__init__(vllm_config)
self.name = SpecDcodeType.NGRAM
self.device = device
self.runner = runner
def load_model(self, model):
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has been implement by NgramProposer from vllm already.
7b68b19
to
ded94a4
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
ded94a4
to
d108404
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2361 +/- ##
=======================================
Coverage 77.93% 77.93%
=======================================
Files 134 134
Lines 18504 18504
=======================================
Hits 14422 14422
Misses 4082 4082
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
d108404
to
33f4315
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
33f4315
to
8783e51
Compare
8783e51
to
00dc9dd
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
00dc9dd
to
3a5ca94
Compare
Signed-off-by: wangxiyuan <[email protected]>
3a5ca94
to
c5b00ce
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
create spec decode module. Move all related code there to make the code cleaner