-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[LoRA] Gemma3n LoRA support #24003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[LoRA] Gemma3n LoRA support #24003
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds LoRA support for the Gemma3n model. The changes involve updating the Gemma3nForConditionalGeneration model to inherit from SupportsLoRA and updating the documentation to reflect this new capability.
My review identifies two main issues:
- The LoRA support for
Gemma3nForConditionalGenerationappears incomplete as it does not include support for embedding layers, which is a crucial part of LoRA fine-tuning. - The documentation is updated to claim LoRA support for
Gemma3nForCausalLM, but the corresponding implementation changes are missing in this PR, leading to an inconsistency.
Both issues are of high severity and should be addressed to ensure complete and consistent feature implementation.
|
No, it does not seem to work, trying loading up VLLM server with a LoRA adapter, but it fails torch compilation/CUDA graph generation. The adapter I used only targets support language/text modules. I also raised a bug issue with in detailed context about the problem: #23970 Attached some log files, reproducing the issues on my side on the latest commit of this PR:
python -m vllm.entrypoints.openai.api_server \
--model "$MODEL" \
--host "$HOST" \
--port "$PORT" \
--gpu-memory-utilization "$GPU_MEM_UTIL" \
--tensor-parallel-size "$TENSOR_PARALLEL_SIZE" \
--max-model-len "$MAX_MODEL_LEN" \
--trust-remote-code \
--enable-lora \
--max-lora-rank 32 \
--lora-modules "$LORA_NAME=$LORA_PATH" \
--served-model-name "$MODEL" \
--compilation-config '{"level": 3, "cudagraph_mode": "PIECEWISE"}' \
2>&1 | tee "$LOG_FILE"vllm_server_lora_lev3_compilation.log
python -m vllm.entrypoints.openai.api_server \
--model "$MODEL" \
--host "$HOST" \
--port "$PORT" \
--gpu-memory-utilization "$GPU_MEM_UTIL" \
--tensor-parallel-size "$TENSOR_PARALLEL_SIZE" \
--max-model-len "$MAX_MODEL_LEN" \
--trust-remote-code \
--enable-lora \
--max-lora-rank 32 \
--lora-modules "$LORA_NAME=$LORA_PATH" \
--served-model-name "$MODEL" \
--compilation-config '{"level": 0, "cudagraph_mode": "None"}' \
2>&1 | tee "$LOG_FILE" |
|
@NickLucche, getting LoRA support for more audio and vision encoder modules, too, would be great. |
|
Checked running a LoRA adapter from the latest commit on this PR; the server does not load up. https://github.com/vllm-project/vllm/pull/24003/commits/ccafe5e0da8d6b36a1858567e9bd65b970495016 Scripts to reproduce. |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Any progress on LoRA support? |
|
@pratapyash Apologies for the delay, I was off the last few days. |
|
I have committed an adapter for gemma3n-E4B-it at this HuggingFace repo, which is publicly available: Hope this helps you to reproduce the issue @NickLucche |
|
Hello, is there any update on the progress of this PR? |
ccafe5e to
4f1c350
Compare
|
Found the cause of issue #23970 but this requires global changes (not gemma3n-specific). @jeejeelee I would appreciate your review here as you have a lot more context. Also I am now getting when adding LoRAs, but I think we should be able to filter the tower weights easily |
4f1c350 to
a8d07ac
Compare
|
Is torch compilation level=3 and piecewise cudagraph working normally with LoRA? |
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
a8d07ac to
1fed5dd
Compare
|
Documentation preview: https://vllm--24003.org.readthedocs.build/en/24003/ |
|
Hi @pratapyash - I tried running, I hit the error, This is because, when loading the LoRA adapters we specifically check that we load only the modules that vLLM supports. and With the PR, I still couldn't get the server to come up successfully. I think the Can you test the PR with a LoRA model that contains just the language model adapters ? If that passes, I think we should land the PR and we can probably consider NickLucche#3 decide if we want to land it separately. cc @jeejeelee |
|
@NickLucche Sry for missing this PR due to holiday |
|
Looking for some alternative adapter for testing cc @pratapyash |
I had used this unsloth notebook, can run training for a single step to get adapter weights for testing: Unsloth Notebook |
Signed-off-by: NickLucche <[email protected]>
|
Testing out loading with different adapters as suggested by @varun-sundar-rabindranath appears to work fine (text-only,
Let me know what you think @jeejeelee |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Hello, I'm happy to see there have been progress on this! Is there any update on this? |
|
Just waiting on @jeejeelee review here as per my last update. |
Add LoRA support to gemma3n.
FIX #21746