Custom behavior of HYPH, the "-" symbol #9793
-
Hello. I try to analyze a text where the hyphen ("-") symbol has special meanings. I would like to achieve to specific behaviors, preferably with Spacy's built in pipes and also preferably without having to use string.replace function.
I've tried "add_special_case" for (1) with setting " - " to ORTH: " - " but it had no effect, as it looks like "-" is represented as a token after the tokenizer, and it looks like there is no way to differentiate on whether it has spaces or not, I don't know which other pipe could be relevant. Any idea on how to achieve this? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The feature for whether there's a following space is called
|
Beta Was this translation helpful? Give feedback.
The feature for whether there's a following space is called
SPACY
:add_special_case
will only work in v2 not v3, if you're setting anything other thanORTH
andNORM
(in v3 there should be an error if you try, so I don't think it will fail silently). In v3, you can use an attribute ruler to set annotation or a custom component if you want to retokenize in addition.