-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Description
System Info
transformers==v5.0.0rc3
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Gemma3TextConfig has inconsistent behavior compared to other counterparts (e.g., Gemma2Config, LlamaConfig, Qwen3VLTextConfig) when handling the rope_parameters argument during initialization.
While other configuration classes generally accept a simple dictionary for rope_parameters, Gemma3TextConfig strictly expects a nested dictionary structure with specific keys ("full_attention", "sliding_attention"). Passing a standard dictionary (e.g. {'rope_theta': 1000000.0}) raises a KeyError.
Run:
from transformers import Gemma3TextConfig
from transformers import Qwen3VLTextConfig
print("Default")
qwen3_config = Qwen3VLTextConfig()
print(f"qwen3_config.rope_parameters: {qwen3_config.rope_parameters}")
gemma3_config = Gemma3TextConfig()
print(f"gemma3_config.rope_parameters: {gemma3_config.rope_parameters}")
print("\nSet rope_theta=1000000.0")
qwen3_config = Qwen3VLTextConfig(
rope_parameters=dict(
rope_theta=1000000.0,
)
)
print(f"qwen3_config.rope_parameters: {qwen3_config.rope_parameters}")
gemma3_config = Gemma3TextConfig(
rope_parameters=dict(
rope_theta=1000000.0,
)
)
print(f"gemma3_config.rope_parameters: {gemma3_config.rope_parameters}")Output:
Default
qwen3_config.rope_parameters: {'rope_theta': 500000.0, 'rope_type': 'default'}
gemma3_config.rope_parameters: {'sliding_attention': {'rope_type': 'default', 'rope_theta': 10000.0}, 'full_attention': {'rope_type': 'default', 'rope_theta': 1000000.0}}
Set rope_theta=1000000.0
qwen3_config.rope_parameters: {'rope_theta': 1000000.0, 'rope_type': 'default'}
Traceback (most recent call last):
File "/home/tcc/transformers/rope_params.py", line 19, in <module>
gemma3_config = Gemma3TextConfig(
rope_parameters=dict(
rope_theta=1000000.0,
)
)
File "/home/tcc/transformers/src/transformers/models/gemma3/configuration_gemma3.py", line 202, in __init__
super().__init__(**kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/home/tcc/transformers/src/transformers/configuration_utils.py", line 219, in __init__
kwargs = self.convert_rope_params_to_dict(
ignore_keys_at_rope_validation=ignore_keys_at_rope_validation, **kwargs
)
File "/home/tcc/transformers/src/transformers/models/gemma3/configuration_gemma3.py", line 216, in convert_rope_params_to_dict
self.rope_parameters["full_attention"].setdefault(
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'full_attention'
root cause:
transformers/src/transformers/models/gemma3/configuration_gemma3.py
Lines 216 to 219 in 35a0989
| self.rope_parameters["full_attention"].setdefault( | |
| "rope_theta", kwargs.pop("rope_theta", self.default_theta["global"]) | |
| ) | |
| self.rope_parameters["sliding_attention"].setdefault( |
Expected behavior
| def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation=None, **kwargs): |
There should be some extra handlings if rope_parameters accepts RopeParameters type
| rope_parameters: RopeParameters | dict[str, RopeParameters] | None = None, |
or perhaps another typed dict should be defined.