QuantoQuantizedCache is not working

### System Info

- `transformers` version: 4.56.0.dev0
- Platform: Linux-6.14.0-27-generic-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.34.3
- Safetensors version: 0.6.1
- Accelerate version: 1.9.0
- Accelerate config:    not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: Yes
- GPU type: NVIDIA RTX A6000


### Who can help?

@manueldeprada  @SunMarc @MekkCyber

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I'm using the ame script as in the `QuantoQuantizedCache` docstring

```
from transformers import AutoTokenizer, AutoModelForCausalLM, QuantoQuantizedCache

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")

inputs = tokenizer(text="My name is Qwen2", return_tensors="pt")

past_key_values = QuantoQuantizedCache(config=model.config, nbits=4)
outputs = model(**inputs, past_key_values=past_key_values, use_cache=True)
outputs.past_key_values
```
which yields to
```
Traceback (most recent call last):
  File "/home/mjeblick/.config/JetBrains/PyCharm2025.2/scratches/scratch_41.py", line 8, in <module>
    past_key_values = QuantoQuantizedCache(config=model.config, nbits=4)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/2tb/PyCharmProjects/kvpress/.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 1305, in __init__
    super().__init__("quanto", config, nbits, axis_key, axis_value, q_group_size, residual_length)
  File "/mnt/2tb/PyCharmProjects/kvpress/.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 1257, in __init__
    layer_class(nbits, axis_key, axis_value, q_group_size, residual_length)
  File "/mnt/2tb/PyCharmProjects/kvpress/.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 587, in __init__
    super().__init__(
  File "/mnt/2tb/PyCharmProjects/kvpress/.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 515, in __init__
    super().__init__(self)
TypeError: CacheLayerMixin.__init__() takes 1 positional argument but 2 were given

```

(I'm using optimum_quanto-0.2.7, I don't think, however, the version matters here, as it seems to be an inheritance issue).


### Expected behavior

`QuantoQuantizedCache` works as expected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QuantoQuantizedCache is not working #40099

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QuantoQuantizedCache is not working #40099

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions