[BugFix] Call Base Layer Directly if LoRA A/B in Parallel Vocab are 0 #29167

alex-jw-brooks · 2025-11-21T08:42:59Z

Purpose

Partial fix for #29166 - it doesn't fix the underlying edge case for the kernel being hit by granite speech, but it does at least fix the behavior for granite speech models, since none of them have LoRA weights for the embedding/lm head.

CC @jeejeelee @DarkLight1337

Test Plan

Add a test to make sure results are equivalent to just calling the base layer

Test Result

Test passes
Granite speech models are able to avoid the illegal memory access for this edge case

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Alex-Brooks <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a bug fix to bypass the LoRA forward pass in VocabParallelEmbedding when the LoRA weights are zero. This is a good optimization and addresses an edge case for specific models. The implementation is sound and a corresponding test has been added. However, I've identified a small but significant issue in the new test where variables are swapped, which could lead to confusion and potentially hide bugs in the future. My review includes a suggestion to correct this.

tests/lora/test_layers.py

Signed-off-by: Alex-Brooks <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-21T08:45:39Z

vllm/lora/layers/vocal_parallel_embedding.py

    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # Short circuit and just run the base layer if either A or B
+        # are all zero we can just call the base layer directly.
+        if bool(torch.all(self.lora_a_stacked == 0)) or bool(
+            torch.all(self.lora_b_stacked == 0)
+        ):
+            return self.base_layer.forward(x)


Short-circuit scans full LoRA stacks every forward

The new zero check reduces over self.lora_a_stacked and self.lora_b_stacked on every call to forward, converting the GPU result to a Python bool. These tensors are sized max_loras × vocab_size × rank and live on GPU; scanning them each token adds O(max_loras·vocab) work plus a host sync even when LoRA weights are non-zero, which is far heavier than the previous gather-based path and will noticeably slow embedding lookups for any LoRA-enabled run. Consider caching a flag when weights are loaded instead of recomputing a full reduction per forward.

Useful? React with 👍 / 👎.

jeejeelee · 2025-11-21T17:08:42Z

Give me some time to look into the root cause of this bug, thank you.

alex-jw-brooks · 2025-11-21T19:07:12Z

Sure, thank you @jeejeelee!

alex-jw-brooks added 6 commits November 21, 2025 07:48

Support leaving lora a / b as zero in random init util

8ffd97f

Signed-off-by: Alex-Brooks <[email protected]>

Add test explicitly comparing outputs to base layer for zero lora

006fe22

Signed-off-by: Alex-Brooks <[email protected]>

Only test one random seed

c6ab03f

Signed-off-by: Alex-Brooks <[email protected]>

only short circuit to base layer if no tokens are added

263cce8

Signed-off-by: Alex-Brooks <[email protected]>

remove vocab from zero lora test

ef1c56c

Signed-off-by: Alex-Brooks <[email protected]>

remove unused extra vocab mask

dddca28

Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks requested a review from jeejeelee as a code owner November 21, 2025 08:43

gemini-code-assist bot reviewed Nov 21, 2025

View reviewed changes

tests/lora/test_layers.py Outdated Show resolved Hide resolved

fix variable swap

ecd19f4

Signed-off-by: Alex-Brooks <[email protected]>

chatgpt-codex-connector bot reviewed Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Call Base Layer Directly if LoRA A/B in Parallel Vocab are 0 #29167

[BugFix] Call Base Layer Directly if LoRA A/B in Parallel Vocab are 0 #29167

alex-jw-brooks commented Nov 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 21, 2025

Uh oh!

jeejeelee commented Nov 21, 2025

Uh oh!

alex-jw-brooks commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[BugFix] Call Base Layer Directly if LoRA A/B in Parallel Vocab are 0 #29167

Are you sure you want to change the base?

[BugFix] Call Base Layer Directly if LoRA A/B in Parallel Vocab are 0 #29167

Conversation

alex-jw-brooks commented Nov 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Nov 21, 2025

Uh oh!

alex-jw-brooks commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-jw-brooks commented Nov 21, 2025 •

edited by github-actions bot

Loading