[LoRA] Gemma3n LoRA support #24003

NickLucche · 2025-08-31T09:43:54Z

Add LoRA support to gemma3n.

gemini-code-assist

Code Review

This pull request adds LoRA support for the Gemma3n model. The changes involve updating the Gemma3nForConditionalGeneration model to inherit from SupportsLoRA and updating the documentation to reflect this new capability.

My review identifies two main issues:

The LoRA support for Gemma3nForConditionalGeneration appears incomplete as it does not include support for embedding layers, which is a crucial part of LoRA fine-tuning.
The documentation is updated to claim LoRA support for Gemma3nForCausalLM, but the corresponding implementation changes are missing in this PR, leading to an inconsistency.

Both issues are of high severity and should be addressed to ensure complete and consistent feature implementation.

docs/models/supported_models.md

vllm/model_executor/models/gemma3n_mm.py

pratapyash · 2025-08-31T10:44:49Z

No, it does not seem to work, trying loading up VLLM server with a LoRA adapter, but it fails torch compilation/CUDA graph generation. The adapter I used only targets support language/text modules.

I also raised a bug issue with in detailed context about the problem: #23970

Attached some log files, reproducing the issues on my side on the latest commit of this PR: dec277bc07d2c9368d88ee7c69ce2e2f7a97c6d5

With compilation == level 3, cudagraph == piecewise

python -m vllm.entrypoints.openai.api_server \
  --model "$MODEL" \
  --host "$HOST" \
  --port "$PORT" \
  --gpu-memory-utilization "$GPU_MEM_UTIL" \
  --tensor-parallel-size "$TENSOR_PARALLEL_SIZE" \
  --max-model-len "$MAX_MODEL_LEN" \
  --trust-remote-code \
  --enable-lora \
  --max-lora-rank 32 \
  --lora-modules "$LORA_NAME=$LORA_PATH" \
  --served-model-name "$MODEL" \
  --compilation-config '{"level": 3, "cudagraph_mode": "PIECEWISE"}' \
  2>&1 | tee "$LOG_FILE"

vllm_server_lora_lev3_compilation.log

With compilation == level 0, cudagraph == none

python -m vllm.entrypoints.openai.api_server \
  --model "$MODEL" \
  --host "$HOST" \
  --port "$PORT" \
  --gpu-memory-utilization "$GPU_MEM_UTIL" \
  --tensor-parallel-size "$TENSOR_PARALLEL_SIZE" \
  --max-model-len "$MAX_MODEL_LEN" \
  --trust-remote-code \
  --enable-lora \
  --max-lora-rank 32 \
  --lora-modules "$LORA_NAME=$LORA_PATH" \
  --served-model-name "$MODEL" \
  --compilation-config '{"level": 0, "cudagraph_mode": "None"}' \
  2>&1 | tee "$LOG_FILE"

vllm_server_lora.log

pratapyash · 2025-08-31T10:45:56Z

@NickLucche, getting LoRA support for more audio and vision encoder modules, too, would be great.

pratapyash · 2025-09-01T10:00:05Z

Checked running a LoRA adapter from the latest commit on this PR; the server does not load up. https://github.com/vllm-project/vllm/pull/24003/commits/ccafe5e0da8d6b36a1858567e9bd65b970495016

(EngineCore_0 pid=5359)   File "/home/ubuntu/user/exp/vllm/vllm/v1/worker/utils.py", line 150, in sanity_check_mm_encoder_outputs
(EngineCore_0 pid=5359)     assert len(mm_embeddings) == expected_num_items, (
(EngineCore_0 pid=5359)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=5359) AssertionError: Expected number of multimodal embeddings to match number of input items: 7, but got len(mm_embeddings)=1792 instead. This is most likely due to the incorrect implementation of the model's `get_multimodal_embeddings` method.
[rank0]:[W901 09:45:15.516942456 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

vllm_server_lora.log

Scripts to reproduce.
run_server.sh

mergify · 2025-09-01T10:10:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

pratapyash · 2025-09-05T21:46:51Z

Any progress on LoRA support?

NickLucche · 2025-09-09T10:04:20Z

@pratapyash Apologies for the delay, I was off the last few days.
Any chance you can share an adapter to reproduce the compilation issue, so I can look into that when I have some bw?

pratapyash · 2025-09-09T13:48:43Z

I have committed an adapter for gemma3n-E4B-it at this HuggingFace repo, which is publicly available: yashpratap/gemma3n-e4b-it-generic-adapter
url: https://huggingface.co/yashpratap/gemma3n-e4b-it-generic-adapter

Hope this helps you to reproduce the issue @NickLucche

maot-github · 2025-10-01T09:58:36Z

Hello, is there any update on the progress of this PR?

NickLucche · 2025-10-03T11:00:08Z

Found the cause of issue #23970 but this requires global changes (not gemma3n-specific).

@jeejeelee I would appreciate your review here as you have a lot more context.

Also I am now getting

[rank0]: ValueError: While loading /dev-network-share/users/engine/hub_cache/models--yashpratap--gemma3n-e4b-it-generic-adapter/snapshots/a64ad70475441c76bd3e33d2bd6bcf9109349630, expected target modules in ['1', 'modality_router', 'up_proj', 'embedding_projection', 'per_layer_input_gate', 'gate_proj', 'down_proj', 'prediction_coefs', 'o_proj', '0', 'linear_left', 'per_layer_model_projection', 'correction_coefs', '2', 'k_proj', 'linear_right', 'per_layer_projection', 'q_proj', 'v_proj'] but received ['audio_tower.conformer.0.attention.post', 'audio_tower.conformer.0.attention.post', 'audio_tower.conformer.0.lconv1d.linear_end', 'audio_tower.conformer.0.lconv1d.linear_end', 'audio_tower.conformer.0.lconv1d.linear_start', 'audio_tower.conformer.0.lconv1d.linear_start', 'audio_tower.conformer.1.attention.post', 'audio_tower.conformer.1.attention.post', 'audio_tower.conformer.1.lconv1d.linear_end', 'audio_tower.conformer.1.lconv1d.linear_end', 'audio_tower.conformer.1.lconv1d.linear_start', 'audio_tower.conformer.1.lconv1d.linear_start', 'audio_tower.conformer.10.attention.post', 'audio_tower.conformer.10.attention.post', 'audio_tower.conformer.10.lconv1d.linear_end', 'audio_tower.conformer.10.lconv1d.linear_end', 'audio_tower.conformer.10.lconv1d.linear_start', 'audio_tower.conformer.10.lconv1d.linear_start', 'audio_tower.conformer.11.attention.post', 'audio_tower.conformer.11.attention.post', 'audio_tower.conformer.11.lconv1d.linear_end', 'audio_tower.conformer.11.lconv1d.linear_end', 'audio_tower.conformer.11.lconv1d.linear_start', 'audio_tower.conformer.11.lconv1d.linear_start', 'audio_tower.conformer.2.attention.post', 'audio_tower.conformer.2.attention.post', 'audio_tower.conformer.2.lconv1d.linear_end', 'audio_tower.conformer.2.lconv1d.linear_end', 'audio_tower.conformer.2.lconv1d.linear_start', 'audio_tower.conformer.2.lconv1d.linear_start', 'audio_tower.conformer.3.attention.post', 'audio_tower.conformer.3.attention.post', 'audio_tower.conformer.3.lconv1d.linear_end', 'audio_tower.conformer.3.lconv1d.linear_end', 'audio_tower.conformer.3.lconv1d.linear_start', 'audio_tower.conformer.3.lconv1d.linear_start', 'audio_tower.conformer.4.attention.post', 'audio_tower.conformer.4.attention.post', 'audio_tower.conformer.4.lconv1d.linear_end', 'audio_tower.conformer.4.lconv1d.linear_end', 'audio_tower.conformer.4.lconv1d.linear_start', 'audio_tower.conformer.4.lconv1d.linear_start', 'audio_tower.conformer.5.attention.post', 'audio_tower.conformer.5.attention.post', 'audio_tower.conformer.5.lconv1d.linear_end', 'audio_tower.conformer.5.lconv1d.linear_end', 'audio_tower.conformer.5.lconv1d.linear_start', 'audio_tower.conformer.5.lconv1d.linear_start', 'audio_tower.conformer.6.attention.post', 'audio_tower.conformer.6.attention.post', 'audio_tower.conformer.6.lconv1d.linear_end', 'audio_tower.conformer.6.lconv1d.linear_end', 'audio_tower.conformer.6.lconv1d.linear_start', 'audio_tower.conformer.6.lconv1d.linear_start', 'audio_tower.conformer.7.attention.post', 'audio_tower.conformer.7.attention.post', 'audio_tower.conformer.7.lconv1d.linear_end', 'audio_tower.conformer.7.lconv1d.linear_end', 'audio_tower.conformer.7.lconv1d.linear_start', 'audio_tower.conformer.7.lconv1d.linear_start', 'audio_tower.conformer.8.attention.post', 'audio_tower.conformer.8.attention.post', 'audio_tower.conformer.8.lconv1d.linear_end', 'audio_tower.conformer.8.lconv1d.linear_end', 'audio_tower.conformer.8.lconv1d.linear_start', 'audio_tower.conformer.8.lconv1d.linear_start', 'audio_tower.conformer.9.attention.post', 'audio_tower.conformer.9.attention.post', 'audio_tower.conformer.9.lconv1d.linear_end', 'audio_tower.conformer.9.lconv1d.linear_end', 'audio_tower.conformer.9.lconv1d.linear_start', 'audio_tower.conformer.9.lconv1d.linear_start']. Please verify that the loaded LoRA module is correct

when adding LoRAs, but I think we should be able to filter the tower weights easily

pratapyash · 2025-10-04T07:33:54Z

Is torch compilation level=3 and piecewise cudagraph working normally with LoRA?

Signed-off-by: NickLucche <[email protected]>

mergify · 2025-10-08T12:10:23Z

Documentation preview: https://vllm--24003.org.readthedocs.build/en/24003/

varun-sundar-rabindranath · 2025-10-09T22:08:34Z

Hi @pratapyash - I tried running,

python -m vllm.entrypoints.openai.api_server   --model google/gemma-3n-E2B-it     --gpu-memory-utilization 0.85   --tensor-parallel-size 1   --trust-remote-code   --enable-lora   --max-lora-rank 32   --lora-modules "lora0=yashpratap/gemma3n-e4b-it-generic-adapter"   --served-model-name google/gemma-3n-E2B-it   --compilation-config '{"level": 3, "cudagraph_mode": "PIECEWISE"}'

I hit the error,

[rank0]: ValueError: While loading /dev-network-share/users/engine/hub_cache/models--yashpratap--gemma3n-e4b-it-generic-adapter/snapshots/a64ad70475441c76bd3e33d2bd6bcf9109349630, expected target modules in ['1', 'modality_router', 'up_proj', 'embedding_projection', 'per_layer_input_gate', 'gate_proj', 'down_proj', 'prediction_coefs', 'o_proj', '0', 'linear_left', 'per_layer_model_projection', 'correction_coefs', '2', 'k_proj', 'linear_right', 'per_layer_projection', 'q_proj', 'v_proj'] but received ['audio_tower.conformer.0.attention.post', 'audio_tower.conformer.0.attention.post', 'audio_tower.conformer.0.lconv1d.linear_end', 'audio_tower.conformer.0.lconv1d.linear_end', 'audio_tower.conformer.0.lconv1d.linear_start', 'audio_tower.conformer.0.lconv1d.linear_start', 'audio_tower.conformer.1.attention.post', 'audio_tower.conformer.1.attention.post', 'audio_tower.conformer.1.lconv1d.linear_end', 'audio_tower.conformer.1.lconv1d.linear_end', 'audio_tower.conformer.1.lconv1d.linear_start', 'audio_tower.conformer.1.lconv1d.linear_start', 
 ...
'audio_tower.conformer.4.lconv1d.linear_end', 'audio_tower.conformer.4.lconv1d.linear_end', 'audio_tower.conformer.4.lconv1d.linear_start', 'audio_tower.conformer.4.lconv1d.linear_start', 'audio_tower.conformer.5.attention.post', 'audio_tower.conformer.5.attention.post', 'audio_tower.conformer.5.lconv1d.linear_end', 'audio_tower.conformer.5.lconv1d.linear_end', 'audio_tower.conformer.5.lconv1d.linear_start', 'audio_tower.conformer.5.lconv1d.linear_start', 'audio_tower.conformer.6.attention.post', 'audio_tower.conformer.6.attention.post', 'audio_tower.conformer.6.lconv1d.linear_end', 'audio_tower.conformer.6.lconv1d.linear_end', 'audio_tower.conformer.6.lconv1d.linear_start', 'audio_tower.conformer.6.lconv]. Please verify that the loaded LoRA module is correct

This is because, when loading the LoRA adapters we specifically check that we load only the modules that vLLM supports. and audio_tower isn't supported. I have a PR NickLucche#3 that could fix this. However, I think vLLM shouldn't be doing what I have on that PR (I wanted to see if it'd work as a POC). I believe the LoRA adapter should have only the language_model adapters so vLLM can ensure correctness. Maybe I am missing a valid usecase / convenience for having an audio/vision adapter and not using it.

With the PR, I still couldn't get the server to come up successfully. I think the yashpratap/gemma3n-e4b-it-generic-adapter is incorrectly configured - The "intermediate" size of the gate_proj and up_proj in the base model is 8192, but on the LoRA adapters it is 16384.

Can you test the PR with a LoRA model that contains just the language model adapters ? If that passes, I think we should land the PR and we can probably consider NickLucche#3 decide if we want to land it separately.

cc @jeejeelee

jeejeelee · 2025-10-10T09:09:58Z

@NickLucche Sry for missing this PR due to holiday

NickLucche · 2025-10-10T13:54:46Z

Looking for some alternative adapter for testing cc @pratapyash

pratapyash · 2025-10-10T15:33:08Z

Looking for some alternative adapter for testing cc @pratapyash

I had used this unsloth notebook, can run training for a single step to get adapter weights for testing: Unsloth Notebook

Signed-off-by: NickLucche <[email protected]>

NickLucche · 2025-10-13T17:25:16Z

Testing out loading with different adapters as suggested by @varun-sundar-rabindranath appears to work fine (text-only, --max-lora-rank 32):

mohamed-stifi/final-lora-nia-gen-darija-gemma-3n-e2b-chat
sharmax-vikas/gemma-3n-E2B-donald-trump-2ep-LORA

Let me know what you think @jeejeelee

mergify · 2025-10-14T18:55:53Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

maot-github · 2025-11-06T12:39:03Z

Hello, I'm happy to see there have been progress on this! Is there any update on this?

NickLucche · 2025-11-10T14:15:11Z

Just waiting on @jeejeelee review here as per my last update.
cc @lucianommartins who might be interested in this

NickLucche requested a review from hmellor as a code owner August 31, 2025 09:43

mergify bot added the documentation Improvements or additions to documentation label Aug 31, 2025

NickLucche mentioned this pull request Aug 31, 2025

[Frontend] Gemma3n audio transcriptions/translations endpoint #23735

Merged

gemini-code-assist bot reviewed Aug 31, 2025

View reviewed changes

docs/models/supported_models.md Show resolved Hide resolved

vllm/model_executor/models/gemma3n_mm.py Show resolved Hide resolved

pratapyash mentioned this pull request Aug 31, 2025

[Bug]: Torch Compilation Failure for Gemma3n with LoRA Support - Dynamic Shape Constraints Violated #23970

Open

1 task

Isotr0py requested a review from jeejeelee August 31, 2025 12:18

jeejeelee approved these changes Sep 1, 2025

View reviewed changes

mergify bot added the needs-rebase label Sep 1, 2025

NickLucche force-pushed the gemma3n-lora branch from ccafe5e to 4f1c350 Compare October 3, 2025 10:58

NickLucche force-pushed the gemma3n-lora branch from 4f1c350 to a8d07ac Compare October 3, 2025 11:10

mergify bot removed the needs-rebase label Oct 3, 2025

NickLucche added 4 commits October 6, 2025 07:52

gemma3n lora

3c187d4

Signed-off-by: NickLucche <[email protected]>

gemma3n lora

a607d38

Signed-off-by: NickLucche <[email protected]>

do not flatten input on linear lora layer

bbf097f

Signed-off-by: NickLucche <[email protected]>

precommit

1fed5dd

Signed-off-by: NickLucche <[email protected]>

NickLucche force-pushed the gemma3n-lora branch from a8d07ac to 1fed5dd Compare October 6, 2025 08:01

single-elem case squeeze

7663ed4

Signed-off-by: NickLucche <[email protected]>

mergify bot added the needs-rebase label Oct 14, 2025

Uh oh!

[LoRA] Gemma3n LoRA support #24003

Are you sure you want to change the base?

[LoRA] Gemma3n LoRA support #24003

Conversation

NickLucche commented Aug 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

pratapyash commented Aug 31, 2025

Uh oh!

pratapyash commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pratapyash commented Sep 1, 2025

Uh oh!

mergify bot commented Sep 1, 2025

Uh oh!

pratapyash commented Sep 5, 2025

Uh oh!

NickLucche commented Sep 9, 2025

Uh oh!

pratapyash commented Sep 9, 2025

Uh oh!

maot-github commented Oct 1, 2025

Uh oh!

NickLucche commented Oct 3, 2025

Uh oh!

pratapyash commented Oct 4, 2025

Uh oh!

mergify bot commented Oct 8, 2025

Uh oh!

varun-sundar-rabindranath commented Oct 9, 2025

Uh oh!

jeejeelee commented Oct 10, 2025

Uh oh!

NickLucche commented Oct 10, 2025

Uh oh!

pratapyash commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickLucche commented Oct 13, 2025

Uh oh!

mergify bot commented Oct 14, 2025

Uh oh!

maot-github commented Nov 6, 2025

Uh oh!

NickLucche commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

NickLucche commented Aug 31, 2025 •

edited by github-actions bot

Loading

pratapyash commented Aug 31, 2025 •

edited

Loading

pratapyash commented Oct 10, 2025 •

edited

Loading