Skip to content

TOKENIZER_MAPPING_NAMES sometimes returns None, but from_pretrained assumes otherwise #44117

@DavidMChan

Description

@DavidMChan

System Info

When loading a tokenizer with AutoTokenizer (src/transformers/models/auto/tokenization_auto.py), on L652 in from_pretained the code tries to remove "Fast" from the tokenizer mapping for old-style models:

if (
            tokenizer_auto_map is None
            and tokenizer_config_class is not None
            and config_model_type is not None
            and config_model_type != ""
            # Here:
            and TOKENIZER_MAPPING_NAMES.get(config_model_type, "").replace("Fast", "")
            != tokenizer_config_class.replace("Fast", "")
        ):

Permalink:

and TOKENIZER_MAPPING_NAMES.get(config_model_type, "").replace("Fast", "")

However, sometimes TOKENIZER_MAPPING_NAMES.get(config_model_type, "") returns None, instead of an empty string (if the config_model_type is in the dict, but has value None), leading to models failing to load with an attribute error

AttributeError: 'NoneType' object has no attribute 'replace'

This could be fixed as follows:

and (TOKENIZER_MAPPING_NAMES.get(config_model_type, "") or "")

Or by some other reasonable check.

Who can help?

Looks like this was introduced in #42894 - @ArthurZucker and @itazap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run:

from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained('google/siglip2-so400m-patch14-384')

Expected behavior

It shouldn't throw an attribute error :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions