fix(models): apply embedding_multiplier to inputs_embeds in GraniteMoeHybrid by nightcityblade · Pull Request #35026 · vllm-project/vllm

nightcityblade · 2026-02-21T15:08:02Z

Summary

When inputs_embeds is provided to GraniteMoeHybridModel.forward(), the embedding_multiplier scaling was only applied in the input_ids branch, causing garbage output when using input embeddings (e.g. enable_prompt_embeds=True).

Fix

Move the embedding_multiplier scaling outside the if/else branch so it applies to hidden_states regardless of whether it came from input_ids or inputs_embeds. This aligns with how granite.py and granitemoe.py already handle it correctly.

Changes

vllm/model_executor/models/granitemoehybrid.py: 1 line moved (dedented)

…eHybrid When inputs_embeds is provided to GraniteMoeHybridModel.forward(), the embedding_multiplier scaling was skipped, causing garbage output. This aligns the behavior with granite.py and granitemoe.py which correctly apply the multiplier regardless of the embedding source. Fixes vllm-project#34812

github-actions · 2026-02-21T15:08:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

dosubot · 2026-02-21T15:08:11Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

The pull request addresses a bug in GraniteMoeHybridModel.forward() where embedding_multiplier was not applied to inputs_embeds. The fix correctly moves the scaling operation outside the conditional branch, ensuring it applies to hidden_states regardless of its origin. This is a critical fix for correctness when using input embeddings.

DarkLight1337 · 2026-02-21T15:49:17Z

Thanks, but there is already a PR #34813

nightcityblade · 2026-02-21T16:02:29Z

Closing as duplicate — there's already an existing PR at #34813 addressing this. Thanks for pointing that out!

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

nightcityblade closed this Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

fix(models): apply embedding_multiplier to inputs_embeds in GraniteMoeHybrid#35026

fix(models): apply embedding_multiplier to inputs_embeds in GraniteMoeHybrid#35026
nightcityblade wants to merge 1 commit intovllm-project:mainfrom
nightcityblade:fix/issue-34812

nightcityblade commented Feb 21, 2026

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 commented Feb 21, 2026

Uh oh!

nightcityblade commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

nightcityblade commented Feb 21, 2026

Summary

Fix

Changes

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 commented Feb 21, 2026

Uh oh!

nightcityblade commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants