You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix accessing final norm for Gemma-3 models (microsoft#1687)
### Description
This PR fixes how the final norm is identified for the Gemma-3 models.
It works with the latest version of Hugging Face's `transformers`
(v4.55.2).
### Motivation and Context
Previous versions of `transformers` would modify the class structure for
the Gemma-3 models as breaking changes. Since `transformers` has [landed
on a stable way](huggingface/transformers#36741)
to load multi-modal models with `AutoModelForCausalLM` for now, the
current approach is to identify the path to
`model.model.language_model.norm` for the Gemma-3 models that are
multi-modal.
Gemma-3 1B's final norm is accessible at `model.model.norm` while
Gemma-3 4B's final norm is accessible at
`model.model.language_model.norm`. For
[PEFT's](https://github.com/huggingface/peft) decoder-only models, the
core model is accessible at `model.base_model.model` and the final norm
is usually accessible at `model.base_model.model.model.norm`.
We can read the parent-most class name to identify whether a model is
from PEFT or not. One advantage with this approach is that it allows any
adaptations in the path to the final norm of a Transformers model to
still be found in the PEFT version of that model.
0 commit comments