Matcher with LOWER patter fails if token is not lowercase already #5094
-
|
In this example below, the matcher does not match ByeWorldfailing, since in {"LOWER": "Bye"}, Bye is not lowercase. Took me some time to figure it out... |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
|
I searched a bit in the source code, but could not find where this happens. |
Beta Was this translation helpful? Give feedback.
-
|
I feel your pain, tracking this down ;-) I'm not sure this can really be considered a bug in spaCy though. The patterns But maybe in this specific |
Beta Was this translation helpful? Give feedback.
-
|
you're right it's not a bug per se. |
Beta Was this translation helpful? Give feedback.
I feel your pain, tracking this down ;-)
I'm not sure this can really be considered a bug in spaCy though. The patterns
{ATTR: value}will match whether atoken.ATTRequals thevalue, and the pattern you created simply can never match. But to replace it automatically with the lowercase variant may also not be the best solution, because then perhaps that will be expected in other cases as well? For instance, if we're matching onLEMMA, we wouldn't take the lemma of thevalueand match on that, but instead always match on the actual literal provided value.But maybe in this specific
LOWERcase, a warning could be thrown ?