-
Notifications
You must be signed in to change notification settings - Fork 32.2k
Closed
Labels
Description
System Info
When loading a tokenizer with AutoTokenizer (src/transformers/models/auto/tokenization_auto.py), on L652 in from_pretained the code tries to remove "Fast" from the tokenizer mapping for old-style models:
if (
tokenizer_auto_map is None
and tokenizer_config_class is not None
and config_model_type is not None
and config_model_type != ""
# Here:
and TOKENIZER_MAPPING_NAMES.get(config_model_type, "").replace("Fast", "")
!= tokenizer_config_class.replace("Fast", "")
):
Permalink:
| and TOKENIZER_MAPPING_NAMES.get(config_model_type, "").replace("Fast", "") |
However, sometimes TOKENIZER_MAPPING_NAMES.get(config_model_type, "") returns None, instead of an empty string (if the config_model_type is in the dict, but has value None), leading to models failing to load with an attribute error
AttributeError: 'NoneType' object has no attribute 'replace'
This could be fixed as follows:
and (TOKENIZER_MAPPING_NAMES.get(config_model_type, "") or "")
Or by some other reasonable check.
Who can help?
Looks like this was introduced in #42894 - @ArthurZucker and @itazap
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Run:
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained('google/siglip2-so400m-patch14-384')
Expected behavior
It shouldn't throw an attribute error :)
Reactions are currently unavailable