Semgrex rule with spancat #11231

monWork · 2022-07-27T20:56:07Z

monWork
Jul 27, 2022

Hi,

I am trying to identify associated Entity with my spancat classifier using semgrex rules.

pattern = [ { "RIGHT_ID": "neg_action", "RIGHT_ATTRS": {"POS": "VERB"} }, { "LEFT_ID": "neg_action", "REL_OP": ">", "RIGHT_ID": "person_or_company", "RIGHT_ATTRS": { "DEP": {"IN": ["nsubj", "nsubjpass"]}, "ENT_TYPE": {"IN": ["PERSON", "ORG"]} } } ]

I have a span group which contains some words on the doc and want to use that to identify associated person with those words, so in RIGHT_ATTRS to use spancat. I didnt see any example on how to use that. I thought of use '' for it but not sure how thats being setup in spancat.
https://spacy.io/api/token#attributes I dont see anything related to that in list of attributes but my assumption is '' can be used for it but how do i setup my spangroup to '_'

Can you help me with this ?

Answered by adrianeboyd

Jul 28, 2022

You're right that there aren't any Token attributes associated with Doc.spans, since this is stored internally at the doc level rather than the token level.

As an initial workaround you could add a custom token attribute to look up whether the span label is present or to return all the span labels.

If you want to hard-code the attribute, you can add a custom getter that checks for this attribute.

If you want it to be a little more general (but even slower), you can add a custom extension that returns the set of labels and use IS_SUPERSET to check whether the set of labels is a superset of the individual label you're interested in for the pattern.

This is all slow and not generalized well,…

View full answer

adrianeboyd · 2022-07-28T09:11:40Z

adrianeboyd
Jul 28, 2022

You're right that there aren't any Token attributes associated with Doc.spans, since this is stored internally at the doc level rather than the token level.

As an initial workaround you could add a custom token attribute to look up whether the span label is present or to return all the span labels.

If you want to hard-code the attribute, you can add a custom getter that checks for this attribute.

If you want it to be a little more general (but even slower), you can add a custom extension that returns the set of labels and use IS_SUPERSET to check whether the set of labels is a superset of the individual label you're interested in for the pattern.

This is all slow and not generalized well, it's just meant to be a sketch:

import spacy
from spacy.tokens import Token, Span
from spacy.matcher import Matcher


def has_myspangroup_positive(token):
    for span in doc.spans["myspangroup"]:
        if token in span and span.label_ == "POSITIVE":
            return True
    return False


def get_token_labels_from_myspangroup(token):
    labels = set()
    for span in doc.spans["myspangroup"]:
        if token in span:
            labels.add(span.label_)
    return labels


Token.set_extension("has_myspangroup_positive", getter=has_myspangroup_positive)
Token.set_extension("myspangroup_labels", getter=get_token_labels_from_myspangroup)


nlp = spacy.blank("en")
doc = nlp("This is a sentence.")
doc.spans["myspangroup"] = [
    Span(doc, 0, 2, label="POSITIVE"),
    Span(doc, 1, 3, label="NEGATIVE"),
]


matcher = Matcher(nlp.vocab)
matcher.add(
    "POSITIVE_IN_LABELS",
    [[{"_": {"myspangroup_labels": {"IS_SUPERSET": ["POSITIVE"]}}}]],
)

matcher.add(
    "HAS_POSITIVE_LABEL",
    [[{"_": {"has_myspangroup_positive": True}}]],
)
print(matcher(doc))

But in general this is a sensible thing to want to do with doc.spans and we should think about how to make this easier.

1 reply

monWork Jul 28, 2022
Author

Thanks, this is really helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Semgrex rule with spancat #11231

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Semgrex rule with spancat #11231

Uh oh!

monWork Jul 27, 2022

Replies: 1 comment · 1 reply

Uh oh!

adrianeboyd Jul 28, 2022

Uh oh!

monWork Jul 28, 2022 Author

monWork
Jul 27, 2022

Replies: 1 comment 1 reply

adrianeboyd
Jul 28, 2022

monWork Jul 28, 2022
Author