Spacy Matcher doesn't match with "cannot" #10122
-
I am copying and testing the simple example from the documentation, and replacing "hello" with "cannot". Now the matcher returns nothing (it worked with "hello").
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi, If you take a look at the tokens in the doc, you will see that "cannot" is separated into 2 tokens: 'can' and 'not'. Since Matcher patterns are descriptions of tokens to find, your pattern will search for a single token 'cannot', followed by 'world'. This pattern works in your case : The PhraseMatcher works because il will tokenize the text internally and produce patterns which respect the tokenizer behaviour. When writing patterns for Matcher, you need to pay attention to the tokenization, especially when it comes to compound words. |
Beta Was this translation helpful? Give feedback.
-
Thanks very much @Pandalei97 ! Great point! |
Beta Was this translation helpful? Give feedback.
Hi,
If you take a look at the tokens in the doc, you will see that "cannot" is separated into 2 tokens: 'can' and 'not'.
Since Matcher patterns are descriptions of tokens to find, your pattern will search for a single token 'cannot', followed by 'world'.
This pattern works in your case :
pattern = [{"LOWER": "can"}, {"LOWER": "not"}, {"LOWER": "world"}]
The PhraseMatcher works because il will tokenize the text internally and produce patterns which respect the tokenizer behaviour.
When writing patterns for Matcher, you need to pay attention to the tokenization, especially when it comes to compound words.