[Bugfix] Fix hidden_states shape mismatch in AscendDraftModelProposer by Potabk · Pull Request #7602 · vllm-project/vllm-ascend

Potabk · 2026-03-24T11:10:07Z

What this PR does / why we need it?

Running the following script:

from vllm import LLM, SamplingParams

prompts = ["The future of AI is"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="Qwen/Qwen3-8B",
    tensor_parallel_size=1,
    speculative_config={
        "model": "Qwen/Qwen3-0.6B",
        "num_speculative_tokens": 5,
        "method": "draft_model",
    },
)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

There exists a bug:
AscendDraftModelProposer crashed with a shape mismatch error when assigning target hidden states to the draft model's hidden_states buffer:

 RuntimeError: The expanded size of the tensor (1024) must match the existing
 size (4096) at non-singleton dimension 1.

Root cause:
ascend's SpecDecodeBaseProposer incorrectly inherited from vllm.EagleProposer instead of vllm.SpecDecodeBaseProposer. vllm.EagleProposer.init hardcodes pass_hidden_states_to_model=True when calling super().init, which caused needs_extra_input_slots to be computed as False for all subclasses regardless
of the actual value passed. As a result, AscendDraftModelProposer (which passes pass_hidden_states_to_model=False) was forced into the EAGLE code path in set_inputs_first_pass, where it attempted to copy target hidden states (hidden_size=4096) into a buffer allocated for the draft model (hidden_size=1024).

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.18.0
vLLM main: vllm-project/vllm@ed359c4

gemini-code-assist · 2026-03-24T11:10:25Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where the hidden_states in the draft model were mismatched with the target hidden states. The fix involves refining the inheritance structure and constructor parameter passing within the speculative decoding proposer, ensuring that the pass_hidden_states_to_model flag is correctly handled by the base class. This change prevents incorrect state management during the speculative decoding process.

Highlights

Base Class Renaming and Inheritance: The EagleProposer import was aliased as VllmSpecDecodeBaseProposer, and the local SpecDecodeBaseProposer class was updated to inherit from this new alias, ensuring consistency with the upstream base class.
Constructor Argument Propagation: The pass_hidden_states_to_model argument is now correctly passed to the super().__init__ call in SpecDecodeBaseProposer, resolving an issue where this crucial parameter was not being propagated to the base class constructor.
Redundant Assignment Removal: The redundant assignment of self.pass_hidden_states_to_model was removed, as the base class now handles this property, streamlining the initialization process.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a bug in SpecDecodeBaseProposer by ensuring the pass_hidden_states_to_model parameter is correctly propagated to its superclass, VllmSpecDecodeBaseProposer. The change involves updating the __init__ call to include this parameter and removing the now-redundant local assignment. This resolves a potential hidden state mismatch issue during speculative decoding.

In line with the repository's contribution guidelines, here is a suggested title and summary for this pull request:

Suggested PR Title:

[SpecDecode][BugFix] Correctly initialize speculator to fix hidden_states mismatch

Suggested PR Summary:

### What this PR does / why we need it?

This PR fixes a bug where the `pass_hidden_states_to_model` parameter was not passed to the superclass constructor in `SpecDecodeBaseProposer`. This could cause a mismatch between the draft model's hidden states and the target model's hidden states during speculative decoding, as the superclass was not correctly initialized with this configuration.

The changes are as follows:
- The imported `SpecDecodeBaseProposer` from `vllm` is aliased to `VllmSpecDecodeBaseProposer` to resolve a naming conflict.
- The local `SpecDecodeBaseProposer` now inherits from `VllmSpecDecodeBaseProposer`.
- The `super().__init__` call is updated to correctly pass the `pass_hidden_states_to_model` argument.
- The redundant local assignment of `self.pass_hidden_states_to_model` is removed.

### Does this PR introduce _any_ user-facing change?

No, this is an internal bug fix and does not introduce any user-facing changes.

### How was this patch tested?

CI is expected to pass. The fix ensures correct initialization and behavior for speculative decoding when `pass_hidden_states_to_model` is utilized.

github-actions · 2026-03-24T11:31:17Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk requested a review from wangxiyuan as a code owner March 24, 2026 11:10

gemini-code-assist bot reviewed Mar 24, 2026

View reviewed changes

Potabk changed the title ~~[Bugfix]: Fix draft model hidden_states mismatch target hidden_states~~ [Bugfix] Fix hidden_states shape mismatch in AscendDraftModelProposer Mar 24, 2026

Potabk force-pushed the fix_draft branch 2 times, most recently from ad7469d to f3ead10 Compare March 25, 2026 03:55

Potabk added 3 commits March 25, 2026 11:57

fix draft model no parallel

6c29295

Signed-off-by: wangli <wangli858794774@gmail.com>

fix

0f8cc23

Signed-off-by: wangli <wangli858794774@gmail.com>

fix

b614a6a

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk force-pushed the fix_draft branch from 3f77fd6 to b614a6a Compare March 25, 2026 03:57

Potabk added 3 commits March 25, 2026 14:36

fix

12a49b8

Signed-off-by: wangli <wangli858794774@gmail.com>

fix

e3f1d48

Signed-off-by: wangli <wangli858794774@gmail.com>

fix

eea08f7

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk added ready read for review ready-for-test start test by label for PR labels Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix hidden_states shape mismatch in AscendDraftModelProposer#7602

[Bugfix] Fix hidden_states shape mismatch in AscendDraftModelProposer#7602
Potabk wants to merge 6 commits intovllm-project:mainfrom
Potabk:fix_draft

Potabk commented Mar 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Potabk commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Potabk commented Mar 24, 2026 •

edited

Loading