Interaction between entity ruler and training, and other conceptual queries #7591

sarge1989 · 2021-03-27T09:23:10Z

sarge1989
Mar 27, 2021

Hi!

First off, great work on spacy 3 =) I'm new to it but it's an amazing treasure trove.

While working on a project, I've got a couple conceptual questions that I wanted to seek clarification on.

My goal is to fine tune an existing pre-trained model (en_core_web_trf) to perform NER. Some of my categories, such as PERSON and ORG are already part of the label scheme of the pre-trained model. I've also got new NER categories, some of which require statistical, model-based NER (e.g. ADDRESS) and some of which (e.g. MOBILE NUM) I intend to extract solely using regex, using the entity ruler.

In the training data, I labelled all these categories, PERSON, ORG, ADDRESS and MOBILE NUM, regardless whether or not they are to be extracted statistically, or via the entity ruler.

When I simply load and use the pre-trained en_core_web_trf model, the results are already pretty neat. I also realised that adding the entity ruler before the ner component in the pipeline increased the performance, even of those categories that were statistically obtained via the ner component.

When moving on to fine tuning using the cli, I've now got a few questions:

Given that the entity ruler influences the downstream performance of the ner component, would it make sense to incorporate it in the training pipeline, and if so how? I didn't see this in the documentation.
Does it make sense to fine-tune with the categories that I intend to extract solely using the entity ruler regex? Or should I drop them from the fine-tuning and go with just ADDRESS, PERSON and ORG? Or in other words, by adding MOBILE NUM, could performance on ADDRESS and PERSON be affected?
Does it make sense to fine-tune PERSON, ORG and ADDRESS together, given that the former two are already built into the model while ADDRESS is not? Intuitively, one would think ADDRESS would require a different set of model params when training from scratch? Whats the best way to think about this problem?

Thanks so much!

polm · 2021-04-08T04:27:19Z

polm
Apr 8, 2021

Hey, sorry for the delayed reply on this.

As a general point, for high level conceptual questions like "is it better to do X or Y?", often the best advice is to try both approaches and see what the difference is. While depending on the details of the problem it may be possible to give good advice, a lot of ML is empirical, and you just have to try and see.

Given that the entity ruler influences the downstream performance of the ner component, would it make sense to incorporate it in the training pipeline, and if so how? I didn't see this in the documentation.

Currently components do not update examples during training so you can't have this kind of dependence. Allowing this kind of interaction is something we're working on actively now.

Does it make sense to fine-tune with the categories that I intend to extract solely using the entity ruler regex? Or should I drop them from the fine-tuning and go with just ADDRESS, PERSON and ORG? Or in other words, by adding MOBILE NUM, could performance on ADDRESS and PERSON be affected?

This is mostly a try-and-see kind of problem. For MOBILE NUM specifically, I could see it go either way - maybe mobile numbers look sufficiently distinctive that they don't interact with other entities, or maybe having them labeled helps the model learn to differentiate house numbers in an address from mobile numbers.

Does it make sense to fine-tune PERSON, ORG and ADDRESS together, given that the former two are already built into the model while ADDRESS is not? Intuitively, one would think ADDRESS would require a different set of model params when training from scratch? Whats the best way to think about this problem?

If you do fine-tuning without some entities you run the risk of catastrophic forgetting, see here for details. Typical options here would be to train an address-only model, or to train a model with all the annotations. I would expect a lot of interaction between normal entites and addresses ("Coca-Cola Lane"), so I would try both.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Interaction between entity ruler and training, and other conceptual queries #7591

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Interaction between entity ruler and training, and other conceptual queries #7591

Uh oh!

sarge1989 Mar 27, 2021

Replies: 1 comment

Uh oh!

polm Apr 8, 2021

sarge1989
Mar 27, 2021

polm
Apr 8, 2021