fix(models): apply embedding_multiplier to inputs_embeds in GraniteMoeHybrid#35026
fix(models): apply embedding_multiplier to inputs_embeds in GraniteMoeHybrid#35026nightcityblade wants to merge 1 commit intovllm-project:mainfrom
Conversation
…eHybrid When inputs_embeds is provided to GraniteMoeHybridModel.forward(), the embedding_multiplier scaling was skipped, causing garbage output. This aligns the behavior with granite.py and granitemoe.py which correctly apply the multiplier regardless of the embedding source. Fixes vllm-project#34812
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
The pull request addresses a bug in GraniteMoeHybridModel.forward() where embedding_multiplier was not applied to inputs_embeds. The fix correctly moves the scaling operation outside the conditional branch, ensuring it applies to hidden_states regardless of its origin. This is a critical fix for correctness when using input embeddings.
|
Thanks, but there is already a PR #34813 |
|
Closing as duplicate — there's already an existing PR at #34813 addressing this. Thanks for pointing that out! |
Summary
Fixes #34812
When
inputs_embedsis provided toGraniteMoeHybridModel.forward(), theembedding_multiplierscaling was only applied in theinput_idsbranch, causing garbage output when using input embeddings (e.g.enable_prompt_embeds=True).Fix
Move the
embedding_multiplierscaling outside theif/elsebranch so it applies tohidden_statesregardless of whether it came frominput_idsorinputs_embeds. This aligns with howgranite.pyandgranitemoe.pyalready handle it correctly.Changes
vllm/model_executor/models/granitemoehybrid.py: 1 line moved (dedented)