Semgrex rule with spancat #11231
-
|
Hi, I am trying to identify associated Entity with my spancat classifier using semgrex rules.
I have a span group which contains some words on the doc and want to use that to identify associated person with those words, so in RIGHT_ATTRS to use spancat. I didnt see any example on how to use that. I thought of use '' for it but not sure how thats being setup in spancat. Can you help me with this ? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
You're right that there aren't any As an initial workaround you could add a custom token attribute to look up whether the span label is present or to return all the span labels. If you want to hard-code the attribute, you can add a custom getter that checks for this attribute. If you want it to be a little more general (but even slower), you can add a custom extension that returns the set of labels and use This is all slow and not generalized well, it's just meant to be a sketch: import spacy
from spacy.tokens import Token, Span
from spacy.matcher import Matcher
def has_myspangroup_positive(token):
for span in doc.spans["myspangroup"]:
if token in span and span.label_ == "POSITIVE":
return True
return False
def get_token_labels_from_myspangroup(token):
labels = set()
for span in doc.spans["myspangroup"]:
if token in span:
labels.add(span.label_)
return labels
Token.set_extension("has_myspangroup_positive", getter=has_myspangroup_positive)
Token.set_extension("myspangroup_labels", getter=get_token_labels_from_myspangroup)
nlp = spacy.blank("en")
doc = nlp("This is a sentence.")
doc.spans["myspangroup"] = [
Span(doc, 0, 2, label="POSITIVE"),
Span(doc, 1, 3, label="NEGATIVE"),
]
matcher = Matcher(nlp.vocab)
matcher.add(
"POSITIVE_IN_LABELS",
[[{"_": {"myspangroup_labels": {"IS_SUPERSET": ["POSITIVE"]}}}]],
)
matcher.add(
"HAS_POSITIVE_LABEL",
[[{"_": {"has_myspangroup_positive": True}}]],
)
print(matcher(doc))But in general this is a sensible thing to want to do with |
Beta Was this translation helpful? Give feedback.
You're right that there aren't any
Tokenattributes associated withDoc.spans, since this is stored internally at the doc level rather than the token level.As an initial workaround you could add a custom token attribute to look up whether the span label is present or to return all the span labels.
If you want to hard-code the attribute, you can add a custom getter that checks for this attribute.
If you want it to be a little more general (but even slower), you can add a custom extension that returns the set of labels and use
IS_SUPERSETto check whether the set of labels is a superset of the individual label you're interested in for the pattern.This is all slow and not generalized well,…