[Refactor] refactor spec decode #2361

wangxiyuan · 2025-08-14T02:26:04Z

create spec decode module. Move all related code there to make the code cleaner

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@56dcf4e

gemini-code-assist

Code Review

This pull request refactors the speculative decoding logic by moving it into a dedicated spec_decode module with separate proposer classes. This is a good architectural improvement that enhances modularity and code clarity. However, the refactoring has introduced some critical issues in the new proposer classes (EagleProposer and MtpProposer). Specifically, they attempt to access attributes like input_batch and requests directly, which are properties of the NPUModelRunner. These should be accessed through the runner instance. Additionally, EagleProposer is missing the runner instance itself. I've left detailed comments on how to fix these issues.

vllm_ascend/spec_decode/eagle_proposer.py

vllm_ascend/spec_decode/mtp_proposer.py

github-actions · 2025-08-14T02:28:27Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2025-08-14T03:16:17Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the speculative decoding logic into a dedicated spec_decode module, which is a great step towards better code organization. The introduction of a Proposer interface and moving different speculative decoding methods into their own classes cleans up the NPUModelRunner significantly. However, the refactoring has introduced several critical issues, primarily related to incorrect attribute access and mismatched function signatures after moving code around. These issues will likely cause runtime errors and need to be addressed.

vllm_ascend/spec_decode/eagle_proposer.py

vllm_ascend/spec_decode/ngram_proposer.py

vllm_ascend/spec_decode/utils.py

vllm_ascend/spec_decode/mtp_proposer.py

vllm_ascend/worker/model_runner_v1.py

wangxiyuan · 2025-08-14T03:30:27Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the speculative decoding logic into a new spec_decode module, which is a great improvement for code organization and maintainability. It introduces a Proposer interface and moves the logic for different speculative decoding methods into their respective classes. The changes significantly clean up the NPUModelRunner. However, I've found a critical issue where the NgramProposer does not correctly implement the Proposer interface, which will lead to a runtime error. Please see the detailed comment for the fix.

gemini-code-assist · 2025-08-14T03:32:31Z

vllm_ascend/spec_decode/ngram_proposer.py

+from vllm.v1.spec_decode.ngram_proposer import NgramProposer
+
+from vllm_ascend.spec_decode.utils import SpecDcodeType
+
+
+class NgramProposer(NgramProposer):
+
+    def __init__(self, vllm_config, device, runner):
+        super().__init__(vllm_config)
+        self.name = SpecDcodeType.NGRAM
+        self.device = device
+        self.runner = runner


The NgramProposer does not correctly implement the Proposer interface from vllm_ascend/spec_decode/interface.py. It's missing the load_model method and doesn't inherit from Proposer. This will cause a runtime AttributeError in model_runner_v1.py when self.drafter.load_model(self.model) is called for the ngram method. This is a critical issue that will cause a crash.

To fix this, NgramProposer should inherit from Proposer and implement the load_model method. Since the ngram proposer doesn't need to load a model, the method can simply pass.

from vllm.v1.spec_decode.ngram_proposer import NgramProposer as VllmNgramProposer from vllm_ascend.spec_decode.interface import Proposer from vllm_ascend.spec_decode.utils import SpecDcodeType class NgramProposer(VllmNgramProposer, Proposer): def __init__(self, vllm_config, device, runner): super().__init__(vllm_config) self.name = SpecDcodeType.NGRAM self.device = device self.runner = runner def load_model(self, model): pass

it has been implement by NgramProposer from vllm already.

github-actions · 2025-08-14T23:39:41Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

codecov · 2025-08-15T00:07:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.93%. Comparing base (7e494e9) to head (c5b00ce).
⚠️ Report is 41 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2361   +/-   ##
=======================================
  Coverage   77.93%   77.93%           
=======================================
  Files         134      134           
  Lines       18504    18504           
=======================================
  Hits        14422    14422           
  Misses       4082     4082

Flag	Coverage Δ
unittests	`77.93% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-08-19T02:28:43Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-08-21T23:33:02Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-08-22T09:11:37Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: wangxiyuan <[email protected]>

github-actions · 2025-08-27T04:11:17Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

wangxiyuan force-pushed the refactor_spec_decode branch 5 times, most recently from 84377fe to 2eb627d Compare August 14, 2025 03:13

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

wangxiyuan force-pushed the refactor_spec_decode branch 2 times, most recently from 32f82ca to 3e7ee1f Compare August 14, 2025 03:30

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

wangxiyuan force-pushed the refactor_spec_decode branch 3 times, most recently from 7b68b19 to ded94a4 Compare August 14, 2025 07:17

wangxiyuan mentioned this pull request Aug 14, 2025

[BugFix] Fix the issue where the eagle method for speculative decoding fails to load the model #2331

Open

github-actions bot added the merge-conflicts label Aug 14, 2025

wangxiyuan force-pushed the refactor_spec_decode branch from ded94a4 to d108404 Compare August 14, 2025 23:42

github-actions bot removed the merge-conflicts label Aug 14, 2025

github-actions bot added the merge-conflicts label Aug 19, 2025

wangxiyuan force-pushed the refactor_spec_decode branch from d108404 to 33f4315 Compare August 21, 2025 07:35

github-actions bot removed the merge-conflicts label Aug 21, 2025

github-actions bot added the merge-conflicts label Aug 21, 2025

wangxiyuan force-pushed the refactor_spec_decode branch from 33f4315 to 8783e51 Compare August 22, 2025 00:50

github-actions bot removed the merge-conflicts label Aug 22, 2025

wangxiyuan force-pushed the refactor_spec_decode branch from 8783e51 to 00dc9dd Compare August 22, 2025 06:48

github-actions bot added the merge-conflicts label Aug 22, 2025

wangxiyuan force-pushed the refactor_spec_decode branch from 00dc9dd to 3a5ca94 Compare August 25, 2025 06:40

github-actions bot removed the merge-conflicts label Aug 25, 2025

[Refactor] refactor spec decode

c5b00ce

Signed-off-by: wangxiyuan <[email protected]>

wangxiyuan force-pushed the refactor_spec_decode branch from 3a5ca94 to c5b00ce Compare August 26, 2025 01:10

github-actions bot added the merge-conflicts label Aug 27, 2025

wangxiyuan closed this Sep 1, 2025

[Refactor] refactor spec decode #2361

[Refactor] refactor spec decode #2361

Conversation

wangxiyuan commented Aug 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

wangxiyuan commented Aug 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangxiyuan commented Aug 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

codecov bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Uh oh!

wangxiyuan commented Aug 14, 2025 •

edited by github-actions bot

Loading

codecov bot commented Aug 15, 2025 •

edited

Loading