Skip to content

Commit 7192deb

Browse files
authored
Fix last layer generation for text-only models (microsoft#1513)
Models such as Gemma3 1B which is not a multimodal model will not generate correct last layer since it's immediately trying to fetch the model params which it should not. Instead it should only retrieve the base_model param.
1 parent 5441ec1 commit 7192deb

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

src/python/py/models/builder.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2615,8 +2615,12 @@ def has_final_norm(self, module, orig_model):
26152615
# differ.
26162616
model = orig_model.language_model
26172617
elif hasattr(orig_model, "base_model") and hasattr(orig_model.base_model, "model"):
2618-
# Model is from PEFT
2619-
model = orig_model.base_model.model
2618+
if hasattr(orig_model.base_model.model, "model"):
2619+
# Model is from PEFT
2620+
model = orig_model.base_model.model
2621+
else:
2622+
# Model is text-based only.
2623+
model = orig_model.base_model
26202624
else:
26212625
model = orig_model
26222626

0 commit comments

Comments
 (0)