PoS Tags as additional features for training NER #9641

ivankrstev7 · 2021-11-08T14:14:36Z

ivankrstev7
Nov 8, 2021

Hi,
I am wondering whether there is a possibility to encode the Part-Of-Speech tags when training a NER model. My idea would be something like this -> Let's say we have data in a IOB format s.t. column 1 is the token, column 2 is the NER label and column 3 is the PoS tag:
Australia B-GPE PROPN
I believe this could be very beneficial for NER!

Answered by adrianeboyd

Nov 8, 2021

This is technically possible and I can understand why it sounds attractive, but I actually suspect it won't help. The main reason is that PROPN vs. NOUN is one of the places where the tagger makes the most mistakes, so the tags may not be accurate enough to really help. Another is that with the current default configs, the ner model already uses the exact same tok2vec features as the tagger, so it's already taking the same features into consideration. But I haven't tested this and I could be wrong, and I could be wrong for particular domains/datasets for sure.

If you do want to try this, you should look at where the POS annotation is coming from in your pipeline. It might be from morpholo…

View full answer

adrianeboyd · 2021-11-08T16:28:35Z

adrianeboyd
Nov 8, 2021

This is technically possible and I can understand why it sounds attractive, but I actually suspect it won't help. The main reason is that PROPN vs. NOUN is one of the places where the tagger makes the most mistakes, so the tags may not be accurate enough to really help. Another is that with the current default configs, the ner model already uses the exact same tok2vec features as the tagger, so it's already taking the same features into consideration. But I haven't tested this and I could be wrong, and I could be wrong for particular domains/datasets for sure.

If you do want to try this, you should look at where the POS annotation is coming from in your pipeline. It might be from morphologizer and it might be from tagger+attribute_ruler. If your pipeline has a morphologizer, then you want to add "morphologizer" to the list of annotating_components in [training]. If your pipeline has a tagger and no morphologizer, then you want to add "tagger", "attribute_ruler" to the annotating components. The ner component will need to come after these components in the pipeline for annotating_components to work.

If you're sourcing the POS-related components from another pipeline like en_core_web_sm and you don't want to train them further, then put "morphologizer" or "tagger", "attribute_ruler" into [training.frozen_components] so you're only training ner.

More background on annotating components: https://spacy.io/usage/training#annotating-components

Then you want to add POS as a feature in attrs in the tok2vec settings within the NER model along with a number of rows, e.g.:

[components.ner.model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = 96
attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY","POS"]
rows = [5000,2500,2500,2500,100,2500]
include_static_vectors = false

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PoS Tags as additional features for training NER #9641

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

PoS Tags as additional features for training NER #9641

Uh oh!

ivankrstev7 Nov 8, 2021

Replies: 1 comment

Uh oh!

adrianeboyd Nov 8, 2021

ivankrstev7
Nov 8, 2021

adrianeboyd
Nov 8, 2021