Issue with loading transformers into pipelines #10613

ryszardtuora · 2022-04-02T11:38:01Z

ryszardtuora
Apr 2, 2022

I am experimenting with adding transformers to spaCy models, and have run into an issue with loading them. Initially I wanted to load a transformer model which has been finetuned on a specific task, and see whether the knowledge gained there, would be of any use while training it for standard tasks. However using the code from the docs, leads to both the transformer itself, and the tokenizer for it not being loaded (i.e. they remain None). Upon some further digging, I've found that the same issue is still present, when attempting the most basic task, i.e. loading the transformer into blank English model, using the default config. It is hard for me to debug this further, as the stacktrace goes into spacy-transformers, and then thinc, but for some reason the hf_model is initialized as follows:
hf_model = HFObjects(None, None, None, tokenizer_config, transformer_config) And these Nones do not seem to be replaced anywhere down the road.

Is my procedure wrong? If so what would be the correct one?

How to reproduce the behaviour

from spacy_transformers import Transformer
from spacy_transformers.pipeline_component import DEFAULT_CONFIG

nlp = spacy.blank("en")
trf = nlp.add_pipe("transformer", config=DEFAULT_CONFIG["transformer"])

# both evaluate to True
print(trf.model.tokenizer is None)
print(trf.model.transformer is None)

# this throws an error regarding tokenizer, but the same would later happen for the transformer itself
doc = nlp("This is an example sentence.")

Your Environment

spaCy version: 3.2.3
Platform: Linux-5.4.0-99-generic-x86_64-with-glibc2.27
Python version: 3.8.13
Pipelines: en_core_web_lg (3.2.0), pl_core_news_md (3.2.0), en_core_web_trf (3.2.0)

Answered by adrianeboyd

Apr 4, 2022

You're just missing the initialize step that actually loads the transformer model based on the config:

nlp = spacy.blank("en")
trf = nlp.add_pipe("transformer", config=DEFAULT_CONFIG["transformer"])
nlp.initialize()

To be honest, most of the tokenizer and transformer config settings should actually have been placed in [initialize] rather than [components], but we released the first versions of transformer with this in [components] and it would be confusing for users if it changed now.

Related docs on the initialization step: https://spacy.io/usage/training#initialization

View full answer

adrianeboyd · 2022-04-04T06:21:29Z

adrianeboyd
Apr 4, 2022

You're just missing the initialize step that actually loads the transformer model based on the config:

nlp = spacy.blank("en")
trf = nlp.add_pipe("transformer", config=DEFAULT_CONFIG["transformer"])
nlp.initialize()

To be honest, most of the tokenizer and transformer config settings should actually have been placed in [initialize] rather than [components], but we released the first versions of transformer with this in [components] and it would be confusing for users if it changed now.

Related docs on the initialization step: https://spacy.io/usage/training#initialization

1 reply

ryszardtuora Apr 4, 2022
Author

Thank you, I must have missed it!

polm · 2022-05-09T08:44:50Z

polm
May 9, 2022

There is now an FAQ post about this process: #10768

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Issue with loading transformers into pipelines #10613

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Issue with loading transformers into pipelines #10613

Uh oh!

ryszardtuora Apr 2, 2022

How to reproduce the behaviour

Your Environment

Replies: 2 comments · 1 reply

Uh oh!

adrianeboyd Apr 4, 2022

Uh oh!

ryszardtuora Apr 4, 2022 Author

Uh oh!

polm May 9, 2022

ryszardtuora
Apr 2, 2022

Replies: 2 comments 1 reply

adrianeboyd
Apr 4, 2022

ryszardtuora Apr 4, 2022
Author

polm
May 9, 2022