TypeError: tokenizers.AddedToken() got multiple values for keyword argument 'special'

### System Info

5.2.0

### Who can help?

@ArthurZucker @itazap

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

My Hugging Face tokenizer no longer loads in Transformers v5. The tokenizer is `isaacus/kanon-2-tokenizer`. I am seeing this error when attempting to load it:
```python
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 3
      1 from transformers import AutoTokenizer
----> 3 tok = AutoTokenizer.from_pretrained("isaacus/kanon-2-tokenizer")

File ~/isaacus/cookbooks/.venv/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py:712, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    709     if tokenizer_class is None:
    710         tokenizer_class = TokenizersBackend
--> 712     return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    713 elif getattr(config, "tokenizer_class", None):
    714     _class = config.tokenizer_class

File ~/isaacus/cookbooks/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1712, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
   1709     if file_id not in resolved_vocab_files:
   1710         continue
-> 1712 return cls._from_pretrained(
   1713     resolved_vocab_files,
   1714     pretrained_model_name_or_path,
   1715     init_configuration,
   1716     *init_inputs,
   1717     token=token,
   1718     cache_dir=cache_dir,
   1719     local_files_only=local_files_only,
   1720     _commit_hash=commit_hash,
   1721     _is_local=is_local,
   1722     trust_remote_code=trust_remote_code,
   1723     **kwargs,
   1724 )

File ~/isaacus/cookbooks/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1839, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, trust_remote_code, *init_inputs, **kwargs)
   1837     continue  # User-provided kwargs take precedence
   1838 if isinstance(value, dict) and key != "extra_special_tokens":
-> 1839     value = AddedToken(**value, special=True)
   1840 elif key == "extra_special_tokens" and isinstance(value, list):
   1841     # Merge list tokens, converting dicts to AddedToken
   1842     existing = list(init_kwargs.get("extra_special_tokens") or [])

TypeError: tokenizers.AddedToken() got multiple values for keyword argument 'special'
```

### Expected behavior

The tokenizer should correctly load as it did previously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: tokenizers.AddedToken() got multiple values for keyword argument 'special' #44062

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TypeError: tokenizers.AddedToken() got multiple values for keyword argument 'special' #44062

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions