Training NER on Incomplete Annotations #11114

tokestermw · 2022-07-11T19:21:23Z

tokestermw
Jul 11, 2022

Hello, does spacy train have ability to train an NER on incomplete annotations?

As in, we will start annotation entity type A, B. But then we want to add entity types C, and D. We ideally don't want to re-do the annotations with entity types A, B, C, and D.

It's been mentioned a few other places. I believe Prodigy had the ability train on incomplete annotations (e.g. binary accept/reject, using the --ner-missing argument), but I think the training pipeline was consolidated for spaCy 3.

I wonder if there is a setting in the config we can make to signify incomplete annotations?

I think I saw somewhere where we can label words as '-', but an example would be highly appreciated.

Thanks!

Links

Answered by adrianeboyd

Jul 19, 2022

You can train a model from partial NER annotation, but it's intended for segments of docs where there is no entity annotation at all rather than annotation for a subset of entity types.

You can set "missing" NER annotation for spans of a doc with Doc.set_ents(missing=spans) or use None as the IOB tag with the constructor Doc(ents=["O", None, "B-ENT", ...]).

I don't think it's going to work well in practice for your example case, since it would mean that your partially-annotated docs could only include the entity spans and not O, when the model really needs both to learn well.

The binary accept/reject annotation is used with incorrect_spans_key, but this is to indicate that a particular sp…

View full answer

adrianeboyd · 2022-07-19T15:15:50Z

adrianeboyd
Jul 19, 2022

You can train a model from partial NER annotation, but it's intended for segments of docs where there is no entity annotation at all rather than annotation for a subset of entity types.

You can set "missing" NER annotation for spans of a doc with Doc.set_ents(missing=spans) or use None as the IOB tag with the constructor Doc(ents=["O", None, "B-ENT", ...]).

I don't think it's going to work well in practice for your example case, since it would mean that your partially-annotated docs could only include the entity spans and not O, when the model really needs both to learn well.

The binary accept/reject annotation is used with incorrect_spans_key, but this is to indicate that a particular span is not a particular entity type rather than to indicate partial annotation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training NER on Incomplete Annotations #11114

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Training NER on Incomplete Annotations #11114

Uh oh!

tokestermw Jul 11, 2022

Replies: 1 comment

Uh oh!

adrianeboyd Jul 19, 2022

tokestermw
Jul 11, 2022

adrianeboyd
Jul 19, 2022