fix: gemma-3 checkpoint conversion from litgpt to hf #2195

adi776borate · 2026-01-18T17:22:13Z

What does this PR do?

Solution

Add a check for Gemma-3 before the MLP fallback in convert_lit_checkpoint.py:

elif config.name.startswith("Gemma-3"):
    untie_weights = True
    copy_fn = partial(copy_weights_gemma_3, config, untie_weights=untie_weights)

Note: untie_weights = True because Gemma-3 has tied weights in HF format,

Who can review this?

@bhimrazy
Anyone from the community is free to review once the tests are passed.

bhimrazy

Thanks @adi776borate — nice catch!

litgpt/scripts/convert_lit_checkpoint.py

adi776borate · 2026-01-19T06:47:25Z

One more thing:
I faced so much friction while converting my finetuned gemma3-27b-it model from LitGPT to HF.
After finetuning, lm_head and embed_tokens diverge. Currently, untie_weights=True discards the finetuned lm_head. If LitGPT design philosophy is flexible to have slightly different structure of finetuned model from base model then it should add a --preserve-lm-head flag that:

Sets untie_weights=False (a better variable name is required)
Outputs config.json with "tie_word_embeddings": false (this requires extended support for LitGPT to HF conversion as currently, it only outputs model.pth without any config files)

This will preserve both weights for finetuned models. Can address in separate issue/PR if interested.

KaelanDt

thank you @adi776borate ! Can we add a test too to make sure we don't regress?

- Set untie_weights=True as default for copy_weights_gemma_2/3 - Reorder Gemma-3 elif for consistency with Gemma-2 - Update tests to verify default behavior

…ate/litgpt into fix/gemma3-convert-lit

adi776borate · 2026-01-20T17:21:47Z

I've slightly modified the approach to be more cleaner.
I've changed the default untie_weights to True for both copy_weights_gemma_2 and copy_weights_gemma_3 (since Gemma models always tie weights).

To test for regression, I removed the explicit untie_weights=True from existing tests (they were masking this problem!) - they now verify the default behavior works correctly. There is no need for a new test.

adi776borate · 2026-01-21T07:47:26Z

One more thing: I faced so much friction while converting my finetuned gemma3-27b-it model from LitGPT to HF. After finetuning, lm_head and embed_tokens diverge. Currently, untie_weights=True discards the finetuned lm_head. If LitGPT design philosophy is flexible to have slightly different structure of finetuned model from base model then it should add a --preserve-lm-head flag that:

Sets untie_weights=False (a better variable name is required)

Outputs config.json with "tie_word_embeddings": false (this requires extended support for LitGPT to HF conversion as currently, it only outputs model.pth without any config files)

This will preserve both weights for finetuned models. Can address in separate issue/PR if interested.

Any thoughts on this @bhimrazy @KaelanDt ?

KaelanDt · 2026-01-21T10:42:41Z

One more thing: I faced so much friction while converting my finetuned gemma3-27b-it model from LitGPT to HF. After finetuning, lm_head and embed_tokens diverge. Currently, untie_weights=True discards the finetuned lm_head. If LitGPT design philosophy is flexible to have slightly different structure of finetuned model from base model then it should add a --preserve-lm-head flag that:

Sets untie_weights=False (a better variable name is required)

Outputs config.json with "tie_word_embeddings": false (this requires extended support for LitGPT to HF conversion as currently, it only outputs model.pth without any config files)

This will preserve both weights for finetuned models. Can address in separate issue/PR if interested.

Any thoughts on this @bhimrazy @KaelanDt ?

We could add such a flag, but it seems that other ways of finetuning would also lead to other differences in weights. I think supporting all formats for finetune models is a bit out of scope. Of course if --preserve-lm-head covers a large fraction of usecases we should add it. This is for fine-tuning in which scenario?

adi776borate · 2026-01-21T10:54:02Z

This is for fine-tuning in which scenario?

This is for LoRA finetuning.

If LitGPT unties both the weights while loading the model (and using same for inference in chat), same behavior should be preserved in the HF-converted model. This is the motivation behind my request. A similar stress was reported in #1762 .

Fix: Correct Gemma-3 checkpoint conversion

6b83332

adi776borate requested review from KaelanDt, andyland, k223kim, lantiga, lianakoleva and t-vi as code owners January 18, 2026 17:22

bhimrazy changed the title ~~Fix: Correct Gemma-3 checkpoint conversion~~ fix: gemma-3 checkpoint conversion from litgpt to hf Jan 18, 2026

bhimrazy approved these changes Jan 18, 2026

View reviewed changes

litgpt/scripts/convert_lit_checkpoint.py Outdated Show resolved Hide resolved

Merge branch 'main' into fix/gemma3-convert-lit

47bd7e7

KaelanDt reviewed Jan 20, 2026

View reviewed changes

adi776borate added 2 commits January 20, 2026 22:45

Fix Gemma weight conversion defaults and reorder conditions

b923101

- Set untie_weights=True as default for copy_weights_gemma_2/3 - Reorder Gemma-3 elif for consistency with Gemma-2 - Update tests to verify default behavior

Merge branch 'fix/gemma3-convert-lit' of https://github.com/adi776bor…

33b126d

…ate/litgpt into fix/gemma3-convert-lit

adi776borate requested a review from KaelanDt January 21, 2026 05:49

KaelanDt approved these changes Jan 21, 2026

View reviewed changes

lianakoleva merged commit f9f3dcb into Lightning-AI:main Jan 23, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gemma-3 checkpoint conversion from litgpt to hf #2195

fix: gemma-3 checkpoint conversion from litgpt to hf #2195

Uh oh!

adi776borate commented Jan 18, 2026

Uh oh!

bhimrazy left a comment

Uh oh!

Uh oh!

adi776borate commented Jan 19, 2026

Uh oh!

KaelanDt left a comment

Uh oh!

adi776borate commented Jan 20, 2026

Uh oh!

adi776borate commented Jan 21, 2026

Uh oh!

KaelanDt commented Jan 21, 2026

Uh oh!

adi776borate commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: gemma-3 checkpoint conversion from litgpt to hf #2195

fix: gemma-3 checkpoint conversion from litgpt to hf #2195

Uh oh!

Conversation

adi776borate commented Jan 18, 2026

What does this PR do?

Solution

Who can review this?

Uh oh!

bhimrazy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adi776borate commented Jan 19, 2026

Uh oh!

KaelanDt left a comment

Choose a reason for hiding this comment

Uh oh!

adi776borate commented Jan 20, 2026

Uh oh!

adi776borate commented Jan 21, 2026

Uh oh!

KaelanDt commented Jan 21, 2026

Uh oh!

adi776borate commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants