Skip to content
Discussion options

You must be logged in to vote

Underneath spans are defined over tokens rather than over characters, so there can still be misalignments with spans.

I think what you might be seeing with expand is that there's a previous annotation that's already been expanded over the token Carrasco, and to make the processing+output the same for doc.ents and doc.spans, currently this component won't return overlapping annotation. Also, none of the underlying models produce overlapping annotation, so I think that would be unexpected.

If you don't care about the tokenization otherwise and just want the character span results, you could replace the default tokenizer with a character tokenizer. I think at that point there's a good chance…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@omri374
Comment options

@adrianeboyd
Comment options

@omri374
Comment options

Answer selected by omri374
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects
2 participants