Semgrex rule with spancat #11231
-
Hi, I am trying to identify associated Entity with my spancat classifier using semgrex rules.
I have a span group which contains some words on the doc and want to use that to identify associated person with those words, so in RIGHT_ATTRS to use spancat. I didnt see any example on how to use that. I thought of use '' for it but not sure how thats being setup in spancat. Can you help me with this ? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
You're right that there aren't any As an initial workaround you could add a custom token attribute to look up whether the span label is present or to return all the span labels. If you want to hard-code the attribute, you can add a custom getter that checks for this attribute. If you want it to be a little more general (but even slower), you can add a custom extension that returns the set of labels and use This is all slow and not generalized well, it's just meant to be a sketch: import spacy
from spacy.tokens import Token, Span
from spacy.matcher import Matcher
def has_myspangroup_positive(token):
for span in doc.spans["myspangroup"]:
if token in span and span.label_ == "POSITIVE":
return True
return False
def get_token_labels_from_myspangroup(token):
labels = set()
for span in doc.spans["myspangroup"]:
if token in span:
labels.add(span.label_)
return labels
Token.set_extension("has_myspangroup_positive", getter=has_myspangroup_positive)
Token.set_extension("myspangroup_labels", getter=get_token_labels_from_myspangroup)
nlp = spacy.blank("en")
doc = nlp("This is a sentence.")
doc.spans["myspangroup"] = [
Span(doc, 0, 2, label="POSITIVE"),
Span(doc, 1, 3, label="NEGATIVE"),
]
matcher = Matcher(nlp.vocab)
matcher.add(
"POSITIVE_IN_LABELS",
[[{"_": {"myspangroup_labels": {"IS_SUPERSET": ["POSITIVE"]}}}]],
)
matcher.add(
"HAS_POSITIVE_LABEL",
[[{"_": {"has_myspangroup_positive": True}}]],
)
print(matcher(doc)) But in general this is a sensible thing to want to do with |
Beta Was this translation helpful? Give feedback.
You're right that there aren't any
Token
attributes associated withDoc.spans
, since this is stored internally at the doc level rather than the token level.As an initial workaround you could add a custom token attribute to look up whether the span label is present or to return all the span labels.
If you want to hard-code the attribute, you can add a custom getter that checks for this attribute.
If you want it to be a little more general (but even slower), you can add a custom extension that returns the set of labels and use
IS_SUPERSET
to check whether the set of labels is a superset of the individual label you're interested in for the pattern.This is all slow and not generalized well,…