Apply rule-based matching in isolation, one token at a time #12610
-
Is it possible to apply the rule-based matching rules in isolation, given a single token and a rule following the same format as when used with a full def on_match(matcher, doc, i, matches):
match_id, start, end = matches[i]
entity = Span(doc, start, end, label=match_id)
rule = { "TEXT": "sentence" }
for token in doc[end:]:
# HERE: apply the above rule to each token
if is_match(token, rule):
# do something
doc = nlp("An example sentence.")
matcher = Matcher(nlp.vocab)
pattern = [
{ "TEXT": "An" },
]
matcher.add("LABEL", [pattern], on_match=on_match) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Hello @e-e, would something like this answer your problem? import spacy
from spacy.matcher import Matcher
nlp = spacy.load("blank:en")
doc1 = nlp("An example sentence.")
doc2 = nlp("Another example.")
matcher = Matcher(nlp.vocab)
pattern = [
{"TEXT": "An"},
{"OP": "*"},
{"TEXT": "sentence"},
]
matcher.add("LABEL", [pattern])
matcher(doc1) # Matches
matcher(doc2) # Does not match You could use the callback to bootstrap a second Matcher instance and mutate the matches list, but it will likely be harder to maintain. |
Beta Was this translation helpful? Give feedback.
Hi @e-e, you can apply a matcher that works with one-token-patterns, and iterate over the results.
Be careful though, the callback is applied every time so using the same matcher is probably not what you want.