using regex on the whole doc. the char map of the documentation #11307
-
Which page or section is this issue related to?https://spacy.io/usage/rule-based-matching#regex-text
I have the impression that this code might not work properly if the regular expression is complicated and includes spaces, which ultimately are also chars of the text but with this code are not mapped. I fixed the problem like this:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Note that as mentioned in the docs, if your span has leading or trailing whitespace that's a problem. The docs don't explicitly state that in that particular section, but spaCy doesn't represent entities that start or end with whitespace. (If you have an example where starting or ending whitespace is significant let us know, I've never seen one before.) That's why the sample code only lets you find token boundaries (non-whitespace). |
Beta Was this translation helpful? Give feedback.
Note that as mentioned in the docs, if your span has leading or trailing whitespace that's a problem. The docs don't explicitly state that in that particular section, but spaCy doesn't represent entities that start or end with whitespace. (If you have an example where starting or ending whitespace is significant let us know, I've never seen one before.) That's why the sample code only lets you find token boundaries (non-whitespace).