Holding conflicting predictions - with `Doc.ents` or `Doc.spans`? #7512

amitbeka · 2021-03-21T09:14:34Z

amitbeka
Mar 21, 2021

Hi,

In SpaCy v3 there are two ways to hold span-based predictions I see: Doc.ents and Doc.spans.
I like the fact that Doc.spans contain SpanGroup objects which are can be conflicting, like we sometimes have from different models. However, it seems Doc.ents cannot hold such conflicts.

What's the reason to having both? I see that a Span contains a reference to the entities within it, and I'm not actually sure what to use. An example sentence for me is:

Text: i need a hotel in sat
Person span group: the word "i"
Request span group: the phrase "i need"
Location span group: "sat" might relate to SAT airport
Date span group: "sat" might be Saturday with the incorrect preposition "in" instead of "on"

Thanks,
Beka

Answered by adrianeboyd

Mar 22, 2021

doc.ents corresponds to the token-level ENT_IOB and ENT_TYPE annotations that are used by the built-in NER component (EntityRecognizer), which can only predict non-overlapping IOB tags.

You could represent doc.ents with a span group, but as soon as your span group has overlapping spans, you can't convert it to the token-level IOB format.

We are working on a new statistical pipeline component that can predict overlapping spans, but it's still in progress. It will predict spans in a span group rather than predicting token-level IOB.

Right now the closest you can get is running multiple NER components that each predict a different set of non-overlapping entity types as doc.ents. Before each …

View full answer

adrianeboyd · 2021-03-22T08:51:15Z

adrianeboyd
Mar 22, 2021

doc.ents corresponds to the token-level ENT_IOB and ENT_TYPE annotations that are used by the built-in NER component (EntityRecognizer), which can only predict non-overlapping IOB tags.

You could represent doc.ents with a span group, but as soon as your span group has overlapping spans, you can't convert it to the token-level IOB format.

We are working on a new statistical pipeline component that can predict overlapping spans, but it's still in progress. It will predict spans in a span group rather than predicting token-level IOB.

Right now the closest you can get is running multiple NER components that each predict a different set of non-overlapping entity types as doc.ents. Before each NER component you'd want to reset doc.ents and afterwards, copy the results into a span group so that you can store all the potentially overlapping predictions.

1 reply

amitbeka Mar 22, 2021
Author

Thanks! I guess I will use your suggestion to reset-and-copy ents to spans.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Holding conflicting predictions - with `Doc.ents` or `Doc.spans`? #7512

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Holding conflicting predictions - with Doc.ents or Doc.spans? #7512

Uh oh!

amitbeka Mar 21, 2021

Replies: 1 comment · 1 reply

Uh oh!

adrianeboyd Mar 22, 2021

Uh oh!

amitbeka Mar 22, 2021 Author

Holding conflicting predictions - with `Doc.ents` or `Doc.spans`? #7512

amitbeka
Mar 21, 2021

Replies: 1 comment 1 reply

adrianeboyd
Mar 22, 2021

amitbeka Mar 22, 2021
Author