-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: gemma-3 checkpoint conversion from litgpt to hf #2195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: gemma-3 checkpoint conversion from litgpt to hf #2195
Conversation
bhimrazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adi776borate — nice catch!
|
One more thing:
This will preserve both weights for finetuned models. Can address in separate issue/PR if interested. |
KaelanDt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @adi776borate ! Can we add a test too to make sure we don't regress?
- Set untie_weights=True as default for copy_weights_gemma_2/3 - Reorder Gemma-3 elif for consistency with Gemma-2 - Update tests to verify default behavior
…ate/litgpt into fix/gemma3-convert-lit
|
I've slightly modified the approach to be more cleaner. To test for regression, I removed the explicit |
|
We could add such a flag, but it seems that other ways of finetuning would also lead to other differences in weights. I think supporting all formats for finetune models is a bit out of scope. Of course if |
This is for LoRA finetuning. If LitGPT unties both the weights while loading the model (and using same for inference in chat), same behavior should be preserved in the HF-converted model. This is the motivation behind my request. A similar stress was reported in #1762 . |
What does this PR do?
Fixes #2194
Solution
Add a check for Gemma-3 before the MLP fallback in
convert_lit_checkpoint.py:Note:
untie_weights = Truebecause Gemma-3 has tied weights in HF format,Who can review this?
@bhimrazy
Anyone from the community is free to review once the tests are passed.