Matcher doesn't seem to work with @ symbol? #5745
-
|
I'm trying to use a simple Matcher that will match user mentions on social media, but I can't seem to get it to work. import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en')
m = Matcher(nlp.vocab)
m.add("MENTION", None, [{"ORTH": "@"}, {"IS_ASCII": True}])
m(nlp('@foo')) # returns []If I do something like this instead, it works as expected: m = Matcher(nlp.vocab)
m.add("HASHTAG", None, [{"ORTH": "#"}, {"IS_ASCII": True}])
m(nlp('#foo')) # returns [(16536914698459818706, 0, 2)]I'm assuming I'm missing something simple, is there anything special I need to do to match on Your EnvironmentInfo about spaCy
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
The |
Beta Was this translation helpful? Give feedback.
-
|
Ah gotcha, let me look into that. Thanks! |
Beta Was this translation helpful? Give feedback.
The
Matcheris sensitive to the tokenization, so you need to compare the tokens for"@foo"and"#foo". Using the default English tokenizer,#is split off as a prefix and@isn't. You can either modify the tokenizer or modify your patterns, see https://spacy.io/usage/linguistic-features#tokenization and https://spacy.io/usage/rule-based-matching.