Skip to content
Discussion options

You must be logged in to vote

The feature for whether there's a following space is called SPACY:

matcher.add("TRAILING_SPACE", [[{"SPACY": True}]])

add_special_case will only work in v2 not v3, if you're setting anything other than ORTH and NORM (in v3 there should be an error if you try, so I don't think it will fail silently). In v3, you can use an attribute ruler to set annotation or a custom component if you want to retokenize in addition.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@antonpibm
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / tokenizer Feature: Tokenizer feat / doc Feature: Doc, Span and Token objects
2 participants