Skip to content

Conversation

wangxiyuan
Copy link
Collaborator

@wangxiyuan wangxiyuan commented Aug 14, 2025

create spec decode module. Move all related code there to make the code cleaner

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the speculative decoding logic by moving it into a dedicated spec_decode module with separate proposer classes. This is a good architectural improvement that enhances modularity and code clarity. However, the refactoring has introduced some critical issues in the new proposer classes (EagleProposer and MtpProposer). Specifically, they attempt to access attributes like input_batch and requests directly, which are properties of the NPUModelRunner. These should be accessed through the runner instance. Additionally, EagleProposer is missing the runner instance itself. I've left detailed comments on how to fix these issues.

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wangxiyuan wangxiyuan force-pushed the refactor_spec_decode branch 5 times, most recently from 84377fe to 2eb627d Compare August 14, 2025 03:13
@wangxiyuan
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the speculative decoding logic into a dedicated spec_decode module, which is a great step towards better code organization. The introduction of a Proposer interface and moving different speculative decoding methods into their own classes cleans up the NPUModelRunner significantly. However, the refactoring has introduced several critical issues, primarily related to incorrect attribute access and mismatched function signatures after moving code around. These issues will likely cause runtime errors and need to be addressed.

@wangxiyuan wangxiyuan force-pushed the refactor_spec_decode branch 2 times, most recently from 32f82ca to 3e7ee1f Compare August 14, 2025 03:30
@wangxiyuan
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the speculative decoding logic into a new spec_decode module, which is a great improvement for code organization and maintainability. It introduces a Proposer interface and moves the logic for different speculative decoding methods into their respective classes. The changes significantly clean up the NPUModelRunner. However, I've found a critical issue where the NgramProposer does not correctly implement the Proposer interface, which will lead to a runtime error. Please see the detailed comment for the fix.

Comment on lines 2 to 14
from vllm.v1.spec_decode.ngram_proposer import NgramProposer

from vllm_ascend.spec_decode.utils import SpecDcodeType


class NgramProposer(NgramProposer):

def __init__(self, vllm_config, device, runner):
super().__init__(vllm_config)
self.name = SpecDcodeType.NGRAM
self.device = device
self.runner = runner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The NgramProposer does not correctly implement the Proposer interface from vllm_ascend/spec_decode/interface.py. It's missing the load_model method and doesn't inherit from Proposer. This will cause a runtime AttributeError in model_runner_v1.py when self.drafter.load_model(self.model) is called for the ngram method. This is a critical issue that will cause a crash.

To fix this, NgramProposer should inherit from Proposer and implement the load_model method. Since the ngram proposer doesn't need to load a model, the method can simply pass.

from vllm.v1.spec_decode.ngram_proposer import NgramProposer as VllmNgramProposer

from vllm_ascend.spec_decode.interface import Proposer
from vllm_ascend.spec_decode.utils import SpecDcodeType


class NgramProposer(VllmNgramProposer, Proposer):

    def __init__(self, vllm_config, device, runner):
        super().__init__(vllm_config)
        self.name = SpecDcodeType.NGRAM
        self.device = device
        self.runner = runner

    def load_model(self, model):
        pass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has been implement by NgramProposer from vllm already.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

codecov bot commented Aug 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.93%. Comparing base (7e494e9) to head (c5b00ce).
⚠️ Report is 41 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2361   +/-   ##
=======================================
  Coverage   77.93%   77.93%           
=======================================
  Files         134      134           
  Lines       18504    18504           
=======================================
  Hits        14422    14422           
  Misses       4082     4082           
Flag Coverage Δ
unittests 77.93% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan wangxiyuan force-pushed the refactor_spec_decode branch from 3a5ca94 to c5b00ce Compare August 26, 2025 01:10
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant