Skip to content

OpenKiwi always download the tokenizer files for XLMRoberta even if a local path is configured. #102

@yym6472

Description

@yym6472

When I am training the XLM-Roberta based QE system, I pre-downloaded the pre-trained XLM-Roberta model from huggingface's library and modified the field system.model.encoder.model_name in xlmroberta.yaml from the default xlm-roberta-base to my local path that contains the pre-downloaded XLM-Roberta model. However, when running the code, I found OpenKiwi will always download the files config.json and sentencepiece.bpe.model rather than directly use the pre-downloaded ones.

Finally I found this is caused by the following code in kiwi/systems/encoders/xlmroberta.py:48~49:

        if tokenizer_name not in XLM_ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST:
            tokenizer_name = 'xlm-roberta-base'

which means if model_name is configured to some local path, it will be rewrite to xlm-roberta-base. However, for Bert and XLM, there is no such issue. Is that a bug or under some consideration?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions