Combining Pretrained and Trained NER Components, both with Transformers #9784

dave-espinosa · 2021-11-30T22:02:04Z

dave-espinosa
Nov 30, 2021

Hello everyone!

I have just finished training a NER model using transformers (as accuracy is needed for the application I am designing), to recognize a single entity: 'SKILL'. Temporarily, I wanted another highly-accurate, pretrained NER model (say "en_core_web_trf"), to identify other entities, and so I wanted to "chain" those models (mine and pretrained), which according to the documentation is posible. More specifically, I have followed this tutorial. I am facing 2 issues with it:

The error "[E889] Can't replace 'tok2vec' listeners of component 'ner' because 'tok2vec' is not in the pipeline. Available components: transformer, ner, skill_cleaner. If you didn't call nlp.replace_listeners manually, this is likely a bug in spaCy." triggered when the line skill_ner.replace_listeners("tok2vec", "ner", ["model.tok2vec"]) was run (skill_ner is the name of my custom ner model and BTW trf_ner is the default "en_core_web_trf"). I think this error is "normal", as that tutorial uses word vector models, and mine uses transformers. Is my assertion correct? Anyways, what to do here?
Maybe naively, I decided to comment the previous line, and just add skill_ner's "ner" component, to trf_ner pipeline (following the tutorial). After doing so, I obtained this warning: "[W113] Sourced component 'ner' may not work as expected: source vectors are not identical to current pipeline vectors." and I am not quite sure why. I have made sure the pipelines in both models are equal (['transformer', 'ner']).
Then, I tried to use that model to name entities, but got this:

Which seems weird because skill_ner thrown this result:

And the pre-trained model trf_ner thrown this result:

Needless to say, I was expecting to obtain "the sum" of both NER models, but not what I got. What am I doing wrong?

I am using spaCy V3.2.0 on Python 3.9.7. Please let me know if you need something else, to better understand these queries.

PS.: I have some more general queries derived:

Is it possible to use a word vector NER model in the same pipeline than a transformers NER model? Is there any specific procedure to follow when one is the spaCy default, and the other is a custom one?
When I train a custom NER model, and use this widget, what is the exact component that determines that my NER model will be trained with either word vectors or a transformer? (Is it CPU for word embeddings and GPU for transformers? Or efficiency for word vectors and accuracy for transformers?)
This is my first time using Explosion forums, please let me know if I should split my queries in more posts, but I hope not, since they are all related.

THANK YOU.

Answered by polm

Dec 1, 2021

Transformers pipelines use a "transformer" in the same way that non-transformers use a "tok2vec", so you should change the replace_listeners to refer to the transformer layer rather than the tok2vec layer. (We have had some issues with this in the past, but I think this should work at present.)

You get nonsense labels because, as the warning suggests, the ner component from the other pipeline was trained with different vectors.

As a general note, non-transformer models are not "word vector" models. The tok2vec layer is a CNN, which optionally uses word vectors as input features.

Is it possible to use a word vector NER model in the same pipeline than a transformers NER model?

Yes.

Is th…

View full answer

polm · 2021-12-01T06:28:20Z

polm
Dec 1, 2021

Transformers pipelines use a "transformer" in the same way that non-transformers use a "tok2vec", so you should change the replace_listeners to refer to the transformer layer rather than the tok2vec layer. (We have had some issues with this in the past, but I think this should work at present.)

You get nonsense labels because, as the warning suggests, the ner component from the other pipeline was trained with different vectors.

As a general note, non-transformer models are not "word vector" models. The tok2vec layer is a CNN, which optionally uses word vectors as input features.

Is it possible to use a word vector NER model in the same pipeline than a transformers NER model?

Yes.

Is there any specific procedure to follow when one is the spaCy default, and the other is a custom one?

No. Note that you do need consistent word vectors (which are separate from the tok2vec) between pipelines if they were used as features in training the tok2vec.

When I train a custom NER model, and use this widget, what is the exact component that determines that my NER model will be trained with either word vectors or a transformer?

GPU uses Transformers, CPU does not.

1 reply

dave-espinosa Dec 2, 2021
Author

Hello @polm ,

I managed to successfully use my two NER components in the same pipeline, by doing the modification suggested:

# 'skill_ner' is my own NER model
skill_ner.replace_listeners("transformer", "ner", ["model.tok2vec"])

# 'trf_ner' is the "en_core_web_trf" model
# everything but ["transformer", "ner"] has been disabled in the pipeline
trf_ner.add_pipe(
    "ner",
    name="ner_drug",
    source=skill_ner,
    after="ner",
)

Obtaining the expected result:

And yet, I still have a couple of doubts that are not quite clear yet:

Even when it is a warning and it seems it did not affect my expected output, I would like to better understand the warning [W113] Sourced component 'ner' may not work as expected: source vectors are not identical to current pipeline vectors. Where can I verify such "difference in the vectors"? How to avoid it? (I have made sure to check the documentation, but maybe some material might be worth a double check: I am open to your suggestions).
Do you have any example about how to do this suggestion?:

you do need consistent word vectors (which are separate from the tok2vec) between pipelines if they were used as features in training the tok2vec.

Thank you.

polm · 2021-12-03T05:20:54Z

polm
Dec 3, 2021

Even when it is a warning and it seems it did not affect my expected output, I would like to better understand the warning [W113] Sourced component 'ner' may not work as expected: source vectors are not identical to current pipeline vectors. Where can I verify such "difference in the vectors"? How to avoid it? (I have made sure to check the documentation, but maybe some material might be worth a double check: I am open to your suggestions).

Each pipeline has one set of default vectors associated with the nlp.vocab object. This set can be empty. You can check the identify of this by getting a hash like this:

vectors_hash = hash(nlp.vocab.vectors.to_bytes(exclude=["strings"]))

That's kind of slow, so the actual check in spaCy has some faster things it tries first, like making sure the shapes are equivalent.

One thing is, Transformers models don't have word vectors, so I'm a little confused why you would be getting this warning if you're only using Transformers models. This warning should only apply to the word vectors, not to the tok2vec/transformers vectors.

Do you have any example about how to do this suggestion?:

If you are only using Transformers this shouldn't matter. Otherwise you can just use the same base model (en_core_web_lg or whatever) or just be consistent about word vectors you specify if you supply some custom ones.

You might find this section in the docs helpful.

5 replies

dave-espinosa Dec 7, 2021
Author

Hello @polm

Thank you for the follow-up to this case, and the suggested links (Honestly, I had checked them before, however I gave them a careful look, and now I am clearer about your explanations).

I think this might be the final query regarding this case but, do you think I should open an issue or something similar, regarding this observation?:

Transformers models don't have word vectors, so I'm a little confused why you would be getting this warning if you're only using Transformers models

Thank you and have a nice day.

polm Dec 8, 2021

Thanks for checking, but don't worry about the warning issue - we'll look at that internally and open an issue if necessary.

dave-espinosa Mar 16, 2022
Author

Hello @polm ,

I was curious to know if I can chain more than two models (assume same pipeline in all cases), using the same suggestion provided in this thread. Case in question:

NER "model1", trained to recognize "LABEL1" & "LABEL2"
NER "model2", trained to recognize "LABEL3"
NER "model3", trained to recognize "LABEL4"
Goal: Chaining model1 --> model2 --> model3

Not sure if that would be the best approach, or it would be better to train and use a brand-new, single NER model, to recognize "LABEL1" through "LABEL4". I think it is important to mention, that in my current case, there is an expected unbalance between labels (i.e., the count of words belonging to "LABEL1" is expected to be larger than the count of words belonging to "LABEL3", and so on).

Thank you very much.

polm Mar 16, 2022

There's no technical limit on the number of NER pipelines you can use, though you need to keep in mind how they overwrite each other or not. See the double NER example project.

As to what would be better, when you have a question like "is X or Y better?" in machine learning, the right answer is usually to try them both and see. I would expect a single model to work better in this case, but it depends a lot on the interactions between labels.

dave-espinosa Mar 16, 2022
Author

Thank @polm for your quick response,

Some comments:

There's no technical limit on the number of NER pipelines you can use, though you need to keep in mind how they overwrite each other or not

Yes, I am aware of the overwritting issue. I was just curious whether a "model cascading" could be done. Your comment clarifies and solves that query. Thank you.

when you have a question like "is X or Y better?" in machine learning, the right answer is usually to try them both and see

Of course. I was actually wondering if, based on your experience (or Team Explosion's), you have noticed a trend in this regard. Not being that the case, I think I will have to return to the "good ol' trial & error" 😃👍.

Best regards!

Uh oh!

Combining Pretrained and Trained NER Components, both with Transformers #9784

Uh oh!

dave-espinosa Nov 30, 2021

Replies: 2 comments · 6 replies

Uh oh!

polm Dec 1, 2021

Uh oh!

Uh oh!

dave-espinosa Dec 2, 2021 Author

Uh oh!

polm Dec 3, 2021

Uh oh!

dave-espinosa Dec 7, 2021 Author

Uh oh!

polm Dec 8, 2021

Uh oh!

Uh oh!

dave-espinosa Mar 16, 2022 Author

Uh oh!

Uh oh!

polm Mar 16, 2022

Uh oh!

Uh oh!

dave-espinosa Mar 16, 2022 Author

dave-espinosa
Nov 30, 2021

Replies: 2 comments 6 replies

polm
Dec 1, 2021

dave-espinosa Dec 2, 2021
Author

polm
Dec 3, 2021

dave-espinosa Dec 7, 2021
Author

dave-espinosa Mar 16, 2022
Author

dave-espinosa Mar 16, 2022
Author