How to connect previously trained Spacy NER model to new empty Language pipeline? #13148

justmars · 2023-11-23T11:25:51Z

justmars
Nov 23, 2023

I've been able to train and load a model using a custom script involving a @Language factory:

# post training
_model = spacy.load("./models/statutes/model-best")
_model.pipe_names
# ['tok2vec', 'ner']
doc = _model('Section 13 of PD No. 1869 provides xxx')
for e in doc.ents:
  print(f"{e.text=} {e.label_=}")
# e.text='PD No. 1869' e.label_='STATUTE'

I'd like to load this model in a separate language object and it seems to work but no entities are detected:

x = spacy.blank("en")
x.add_pipe("ner", name="statute_ner", source=_model)
x.pipe_names
# ['statute_ner']
doc = x('Section 13 of PD No. 1869 provides xxx')
doc.ents
# ()

Note this follows the convention described in #10674.

According to the docs:

Important note for trained components
When reusing components across pipelines, keep in mind that the vocabulary, vectors and model settings must match.

Initially I thought that since tok2vec wasn't in the blank model, it would fail. So I tried loading the en_core_web_sm model which had it but it still failed to detect the entities from the pretrained _model.

I think the vocabulary matches. Wehn running x.to_disk, the vocab found in the _model trained appears in x. So I'm guessing I can rule this out.

There was some discussion about loading files via Language.factories through an __init__.py file found in a dated clip of the quite thorough Dr. WJB Mattingly / Python for Digital Humanities but it seems like this has been eclipsed by spacy v3.

Hope for some guidance on what I'm doing wrong. Thank you!

Answered by justmars

Nov 23, 2023

Ok, upon investigating #10674, @adrianeboyd links to https://github.com/explosion/projects/tree/v3/tutorials/ner_double and I see a curious: drug_nlp.replace_listeners("tok2vec", "ner", ["model.tok2vec"]). Didn't know this was a necessary step so I tried it and it solves my problem. Thank you!

_model = spacy.load("./models/statutes/model-best")
_model.pipe_names
# ['tok2vec', 'ner']
# adapted from solution
_model.replace_listeners("tok2vec", "ner", ["model.tok2vec"]) 
# reuse same code above
x = spacy.blank("en")
x.add_pipe("ner", name="statute_ner", source=_model)
x.pipe_names
# ['statute_ner']
doc = x('Section 13 of PD No. 1869 provides xxx')
for e in doc.ents:
  print(f"{e.text=} {e.la…

View full answer

justmars · 2023-11-23T11:48:18Z

justmars
Nov 23, 2023
Author

Ok, upon investigating #10674, @adrianeboyd links to https://github.com/explosion/projects/tree/v3/tutorials/ner_double and I see a curious: drug_nlp.replace_listeners("tok2vec", "ner", ["model.tok2vec"]). Didn't know this was a necessary step so I tried it and it solves my problem. Thank you!

_model = spacy.load("./models/statutes/model-best")
_model.pipe_names
# ['tok2vec', 'ner']
# adapted from solution
_model.replace_listeners("tok2vec", "ner", ["model.tok2vec"]) 
# reuse same code above
x = spacy.blank("en")
x.add_pipe("ner", name="statute_ner", source=_model)
x.pipe_names
# ['statute_ner']
doc = x('Section 13 of PD No. 1869 provides xxx')
for e in doc.ents:
  print(f"{e.text=} {e.label_=}")
# e.text='PD No. 1869' e.label_='STATUTE'

1 reply

rmitsch Nov 24, 2023
Maintainer

Happy you could resolve this, and thanks for posting the solution!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to connect previously trained Spacy NER model to new empty Language pipeline? #13148

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to connect previously trained Spacy NER model to new empty Language pipeline? #13148

Uh oh!

Uh oh!

justmars Nov 23, 2023

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

justmars Nov 23, 2023 Author

Uh oh!

rmitsch Nov 24, 2023 Maintainer

justmars
Nov 23, 2023

Replies: 1 comment 1 reply

justmars
Nov 23, 2023
Author

rmitsch Nov 24, 2023
Maintainer