Skip to content
Discussion options

You must be logged in to vote

doc.ents corresponds to the token-level ENT_IOB and ENT_TYPE annotations that are used by the built-in NER component (EntityRecognizer), which can only predict non-overlapping IOB tags.

You could represent doc.ents with a span group, but as soon as your span group has overlapping spans, you can't convert it to the token-level IOB format.

We are working on a new statistical pipeline component that can predict overlapping spans, but it's still in progress. It will predict spans in a span group rather than predicting token-level IOB.

Right now the closest you can get is running multiple NER components that each predict a different set of non-overlapping entity types as doc.ents. Before each …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@amitbeka
Comment options

Answer selected by amitbeka
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage feat / doc Feature: Doc, Span and Token objects
2 participants