PhraseMatcher not matching correctly on attr when tokenization is customized #11951
Unanswered
NixBiks
asked this question in
Help: Other Questions
Replies: 1 comment
-
The tokenization for uppercase and lowercase may be different, so the tokenization for the provided pattern "US$" isn't the same as for "us$" and then the tokens don't line up when it's trying to match. A similar issue is #6994. There is potential for even more variation in tokenization (especially with custom tokenizer settings), but it's probably sufficient for typical English settings/cases to add both |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have an example where I have
$
in my infixes tokenization rules. However then thePhraseMatcher
fails to match onLOWER
attrHow to reproduce the behaviour
If I don't add
[r"[$]"]
to my infixes then it works fine. I assume that's a bug!?Info about spaCy
Beta Was this translation helpful? Give feedback.
All reactions