Skip to content

Conversation

@Tcc0403
Copy link

@Tcc0403 Tcc0403 commented Jan 16, 2026

What does this PR do?

Fixes #43316

gemma3/gemma3n and modernbert/modernbert_decoder require different rope parameters per layer type, which expects rope_parameters to be a dictionary with two keys sliding_attention and full_attention mapping to RopeParameters containing rope_type.

This PR updates the typing hints for these models and sets the new format dectection stricter.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@zucchini-nlp

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3, gemma3n, modernbert, modernbert_decoder

@Tcc0403 Tcc0403 marked this pull request as draft January 16, 2026 12:47
@Tcc0403 Tcc0403 force-pushed the special-rope-params-typehint branch from b2609cd to f94f8a0 Compare January 16, 2026 12:49
Signed-off-by: Tcc0403 <[email protected]>
@Tcc0403 Tcc0403 force-pushed the special-rope-params-typehint branch from f94f8a0 to ae430d1 Compare January 16, 2026 12:50
@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43320&sha=ae430d

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, when you are ready, please mark PR as "ready for review". I just left couple comments

Comment on lines -213 to +217
self.rope_parameters = self.rope_parameters if self.rope_parameters is not None else default_rope_params
if (
self.rope_parameters.get("sliding_attention") is not None
and self.rope_parameters.get("full_attention") is not None
):
self.rope_parameters = default_rope_params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really correct because only one key could be present technically, if the config.layers_types consists of the same type, for ex all layers are sliding. This happens in tests and Ig those a few might fail now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that's the case, should we also avoid direct access to both keys?

self.rope_parameters["full_attention"].setdefault(
            "rope_theta", kwargs.pop("rope_theta", self.default_theta["global"])
        )
self.rope_parameters["sliding_attention"].setdefault(
            "rope_theta", kwargs.pop("rope_local_base_freq", self.default_theta["local"])
        )

Comment on lines 156 to +157
attn_logit_softcapping: float | None = None,
rope_parameters: RopeParameters | dict[str, RopeParameters] | None = None,
rope_parameters: dict[Literal["full_attention", "sliding_attention"], RopeParameters] | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you apply these diffs in modular files and then run make fix-repo? We keep the model code in modular, which generates the rest of the files automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API discrepancy between Gemma3TextConfig and others

2 participants