Fix typing hints for different rope parameters per layer type #43320

Tcc0403 · 2026-01-16T12:35:08Z

What does this PR do?

gemma3/gemma3n and modernbert/modernbert_decoder require different rope parameters per layer type, which expects rope_parameters to be a dictionary with two keys sliding_attention and full_attention mapping to RopeParameters containing rope_type.

This PR updates the typing hints for these models and sets the new format dectection stricter.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@zucchini-nlp

Signed-off-by: Tcc0403 <[email protected]>

github-actions · 2026-01-16T12:36:10Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3, gemma3n, modernbert, modernbert_decoder

Signed-off-by: Tcc0403 <[email protected]>

github-actions · 2026-01-16T12:56:45Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43320&sha=ae430d

zucchini-nlp

Hey, when you are ready, please mark PR as "ready for review". I just left couple comments

zucchini-nlp · 2026-01-16T12:55:48Z

src/transformers/models/gemma3/configuration_gemma3.py

-        self.rope_parameters = self.rope_parameters if self.rope_parameters is not None else default_rope_params
+        if (
+            self.rope_parameters.get("sliding_attention") is not None
+            and self.rope_parameters.get("full_attention") is not None
+        ):
+            self.rope_parameters = default_rope_params


not really correct because only one key could be present technically, if the config.layers_types consists of the same type, for ex all layers are sliding. This happens in tests and Ig those a few might fail now

if that's the case, should we also avoid direct access to both keys?

self.rope_parameters["full_attention"].setdefault( "rope_theta", kwargs.pop("rope_theta", self.default_theta["global"]) ) self.rope_parameters["sliding_attention"].setdefault( "rope_theta", kwargs.pop("rope_local_base_freq", self.default_theta["local"]) )

zucchini-nlp · 2026-01-16T12:56:57Z

src/transformers/models/gemma3/configuration_gemma3.py

        attn_logit_softcapping: float | None = None,
-        rope_parameters: RopeParameters | dict[str, RopeParameters] | None = None,
+        rope_parameters: dict[Literal["full_attention", "sliding_attention"], RopeParameters] | None = None,


can you apply these diffs in modular files and then run make fix-repo? We keep the model code in modular, which generates the rest of the files automatically

Fix typing hints for different rope parameters per layer type

fdcf6f3

Signed-off-by: Tcc0403 <[email protected]>

Tcc0403 marked this pull request as draft January 16, 2026 12:47

Tcc0403 force-pushed the special-rope-params-typehint branch from b2609cd to f94f8a0 Compare January 16, 2026 12:49

Stricter assumption

ae430d1

Signed-off-by: Tcc0403 <[email protected]>

Tcc0403 force-pushed the special-rope-params-typehint branch from f94f8a0 to ae430d1 Compare January 16, 2026 12:50

zucchini-nlp reviewed Jan 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix typing hints for different rope parameters per layer type #43320

Fix typing hints for different rope parameters per layer type #43320

Tcc0403 commented Jan 16, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

zucchini-nlp left a comment

Uh oh!

zucchini-nlp Jan 16, 2026

Uh oh!

Tcc0403 Jan 16, 2026

Uh oh!

zucchini-nlp Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix typing hints for different rope parameters per layer type #43320

Are you sure you want to change the base?

Fix typing hints for different rope parameters per layer type #43320

Conversation

Tcc0403 commented Jan 16, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants