Skip to content
Discussion options

You must be logged in to vote

You can call ent.start and ent.end instead of ent.start_char and ent.end_char to obtain the token indices of the entity instead of the char indices.

Also, please note that your custom component "remove_specials" should return the doc at the end of its processing.

Finally, I'm not sure this function will work as you intend it when you apply it on a doc with multiple entities in doc.ents, because doc.set_ents always overwrites the entire set of entities. Instead, you probably want to build up the list of new entities and call set_ents once on the doc right before returning it.

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@yw2903
Comment options

@svlandeg
Comment options

Answer selected by adrianeboyd
Comment options

You must be logged in to vote
1 reply
@adrianeboyd
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects
4 participants
Converted from issue

This discussion was converted from issue #7128 on February 19, 2021 18:01.