SpaCy matcher not matching case-insensitive words in a document #11404
Replies: 2 comments 4 replies
-
Perhaps set a Token extension that returns the lemma in lowercase form, then match on that? Token.set_extension("lemma_lower", getter=lambda t: t.lemma_.lower()) pat_piece = {"_": {"lemma_lower": token.lemma_.lower()}} if is_final_token(token, tmpdoc) else {"LOWER": token.lower_} |
Beta Was this translation helpful? Give feedback.
-
For context, the related question and my answer from SO: https://stackoverflow.com/questions/73524777/why-is-spacy-matcher-not-matching-case-insensitive-words-in-a-document
At this point it's not clear exactly what you're trying to match and what's matching or not matching. Please include a minimal example that shows a full doc and the full matcher patterns that you're testing, and explain what the intended matches are. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I want SpaCy matcher to match keywords (multi-word entities) in a document irrespective of their case. I can only match "product preferences", not "PRODUCT PREFERENCES," "Product Preferences," or any combination thereof in my document with the code below because token.lemma is case sensitive.
I tried forcing it with a {"LEMMA" : { "IN" : [] } } construction, adding .upper() and other cases (idea came from https://stackoverflow.com/questions/64758759/force-spacy-lemmas-to-be-lowercase).
No luck still. Can someone suggest how I can match ALL cases for my keywords (multi-word entities)?
Beta Was this translation helpful? Give feedback.
All reactions