Skip to content

[Bugfix] Fix hidden_states shape mismatch in AscendDraftModelProposer#7602

Open
Potabk wants to merge 6 commits intovllm-project:mainfrom
Potabk:fix_draft
Open

[Bugfix] Fix hidden_states shape mismatch in AscendDraftModelProposer#7602
Potabk wants to merge 6 commits intovllm-project:mainfrom
Potabk:fix_draft

Conversation

@Potabk
Copy link
Collaborator

@Potabk Potabk commented Mar 24, 2026

What this PR does / why we need it?

Running the following script:

from vllm import LLM, SamplingParams

prompts = ["The future of AI is"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="Qwen/Qwen3-8B",
    tensor_parallel_size=1,
    speculative_config={
        "model": "Qwen/Qwen3-0.6B",
        "num_speculative_tokens": 5,
        "method": "draft_model",
    },
)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

There exists a bug:
AscendDraftModelProposer crashed with a shape mismatch error when assigning target hidden states to the draft model's hidden_states buffer:

 RuntimeError: The expanded size of the tensor (1024) must match the existing
 size (4096) at non-singleton dimension 1.

Root cause:
ascend's SpecDecodeBaseProposer incorrectly inherited from vllm.EagleProposer instead of vllm.SpecDecodeBaseProposer. vllm.EagleProposer.init hardcodes pass_hidden_states_to_model=True when calling super().init, which caused needs_extra_input_slots to be computed as False for all subclasses regardless
of the actual value passed. As a result, AscendDraftModelProposer (which passes pass_hidden_states_to_model=False) was forced into the EAGLE code path in set_inputs_first_pass, where it attempted to copy target hidden states (hidden_size=4096) into a buffer allocated for the draft model (hidden_size=1024).

Does this PR introduce any user-facing change?

How was this patch tested?

@Potabk Potabk requested a review from wangxiyuan as a code owner March 24, 2026 11:10
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where the hidden_states in the draft model were mismatched with the target hidden states. The fix involves refining the inheritance structure and constructor parameter passing within the speculative decoding proposer, ensuring that the pass_hidden_states_to_model flag is correctly handled by the base class. This change prevents incorrect state management during the speculative decoding process.

Highlights

  • Base Class Renaming and Inheritance: The EagleProposer import was aliased as VllmSpecDecodeBaseProposer, and the local SpecDecodeBaseProposer class was updated to inherit from this new alias, ensuring consistency with the upstream base class.
  • Constructor Argument Propagation: The pass_hidden_states_to_model argument is now correctly passed to the super().__init__ call in SpecDecodeBaseProposer, resolving an issue where this crucial parameter was not being propagated to the base class constructor.
  • Redundant Assignment Removal: The redundant assignment of self.pass_hidden_states_to_model was removed, as the base class now handles this property, streamlining the initialization process.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in SpecDecodeBaseProposer by ensuring the pass_hidden_states_to_model parameter is correctly propagated to its superclass, VllmSpecDecodeBaseProposer. The change involves updating the __init__ call to include this parameter and removing the now-redundant local assignment. This resolves a potential hidden state mismatch issue during speculative decoding.

In line with the repository's contribution guidelines, here is a suggested title and summary for this pull request:

Suggested PR Title:

[SpecDecode][BugFix] Correctly initialize speculator to fix hidden_states mismatch

Suggested PR Summary:

### What this PR does / why we need it?

This PR fixes a bug where the `pass_hidden_states_to_model` parameter was not passed to the superclass constructor in `SpecDecodeBaseProposer`. This could cause a mismatch between the draft model's hidden states and the target model's hidden states during speculative decoding, as the superclass was not correctly initialized with this configuration.

The changes are as follows:
- The imported `SpecDecodeBaseProposer` from `vllm` is aliased to `VllmSpecDecodeBaseProposer` to resolve a naming conflict.
- The local `SpecDecodeBaseProposer` now inherits from `VllmSpecDecodeBaseProposer`.
- The `super().__init__` call is updated to correctly pass the `pass_hidden_states_to_model` argument.
- The redundant local assignment of `self.pass_hidden_states_to_model` is removed.

### Does this PR introduce _any_ user-facing change?

No, this is an internal bug fix and does not introduce any user-facing changes.

### How was this patch tested?

CI is expected to pass. The fix ensures correct initialization and behavior for speculative decoding when `pass_hidden_states_to_model` is utilized.

@Potabk Potabk changed the title [Bugfix]: Fix draft model hidden_states mismatch target hidden_states [Bugfix] Fix hidden_states shape mismatch in AscendDraftModelProposer Mar 24, 2026
@github-actions
Copy link
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@Potabk Potabk force-pushed the fix_draft branch 2 times, most recently from ad7469d to f3ead10 Compare March 25, 2026 03:55
Potabk added 3 commits March 25, 2026 11:57
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Potabk added 3 commits March 25, 2026 14:36
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
@Potabk Potabk added ready read for review ready-for-test start test by label for PR labels Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant