Skip to content

Conversation

@adi776borate
Copy link
Contributor

What does this PR do?

Fixes #2194

Solution

Add a check for Gemma-3 before the MLP fallback in convert_lit_checkpoint.py:

elif config.name.startswith("Gemma-3"):
    untie_weights = True
    copy_fn = partial(copy_weights_gemma_3, config, untie_weights=untie_weights)

Note: untie_weights = True because Gemma-3 has tied weights in HF format,

Who can review this?

@bhimrazy
Anyone from the community is free to review once the tests are passed.

@bhimrazy bhimrazy changed the title Fix: Correct Gemma-3 checkpoint conversion fix: gemma-3 checkpoint conversion from litgpt to hf Jan 18, 2026
Copy link
Collaborator

@bhimrazy bhimrazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adi776borate — nice catch!

@adi776borate
Copy link
Contributor Author

One more thing:
I faced so much friction while converting my finetuned gemma3-27b-it model from LitGPT to HF.
After finetuning, lm_head and embed_tokens diverge. Currently, untie_weights=True discards the finetuned lm_head. If LitGPT design philosophy is flexible to have slightly different structure of finetuned model from base model then it should add a --preserve-lm-head flag that:

  • Sets untie_weights=False (a better variable name is required)
  • Outputs config.json with "tie_word_embeddings": false (this requires extended support for LitGPT to HF conversion as currently, it only outputs model.pth without any config files)

This will preserve both weights for finetuned models. Can address in separate issue/PR if interested.

Copy link
Contributor

@KaelanDt KaelanDt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @adi776borate ! Can we add a test too to make sure we don't regress?

- Set untie_weights=True as default for copy_weights_gemma_2/3
- Reorder Gemma-3 elif for consistency with Gemma-2
- Update tests to verify default behavior
@adi776borate
Copy link
Contributor Author

I've slightly modified the approach to be more cleaner.
I've changed the default untie_weights to True for both copy_weights_gemma_2 and copy_weights_gemma_3 (since Gemma models always tie weights).

To test for regression, I removed the explicit untie_weights=True from existing tests (they were masking this problem!) - they now verify the default behavior works correctly. There is no need for a new test.

@adi776borate adi776borate requested a review from KaelanDt January 21, 2026 05:49
@adi776borate
Copy link
Contributor Author

One more thing: I faced so much friction while converting my finetuned gemma3-27b-it model from LitGPT to HF. After finetuning, lm_head and embed_tokens diverge. Currently, untie_weights=True discards the finetuned lm_head. If LitGPT design philosophy is flexible to have slightly different structure of finetuned model from base model then it should add a --preserve-lm-head flag that:

  • Sets untie_weights=False (a better variable name is required)
  • Outputs config.json with "tie_word_embeddings": false (this requires extended support for LitGPT to HF conversion as currently, it only outputs model.pth without any config files)

This will preserve both weights for finetuned models. Can address in separate issue/PR if interested.

Any thoughts on this @bhimrazy @KaelanDt ?

@KaelanDt
Copy link
Contributor

One more thing: I faced so much friction while converting my finetuned gemma3-27b-it model from LitGPT to HF. After finetuning, lm_head and embed_tokens diverge. Currently, untie_weights=True discards the finetuned lm_head. If LitGPT design philosophy is flexible to have slightly different structure of finetuned model from base model then it should add a --preserve-lm-head flag that:

  • Sets untie_weights=False (a better variable name is required)
  • Outputs config.json with "tie_word_embeddings": false (this requires extended support for LitGPT to HF conversion as currently, it only outputs model.pth without any config files)

This will preserve both weights for finetuned models. Can address in separate issue/PR if interested.

Any thoughts on this @bhimrazy @KaelanDt ?

We could add such a flag, but it seems that other ways of finetuning would also lead to other differences in weights. I think supporting all formats for finetune models is a bit out of scope. Of course if --preserve-lm-head covers a large fraction of usecases we should add it. This is for fine-tuning in which scenario?

@adi776borate
Copy link
Contributor Author

This is for fine-tuning in which scenario?

This is for LoRA finetuning.

If LitGPT unties both the weights while loading the model (and using same for inference in chat), same behavior should be preserved in the HF-converted model. This is the motivation behind my request. A similar stress was reported in #1762 .

@lianakoleva lianakoleva merged commit f9f3dcb into Lightning-AI:main Jan 23, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Gemma-3 checkpoint conversion fails with KeyError

4 participants