Training a relation extraction model with span categorization instead of NER #12725

AdirthaBorgohain · 2023-06-14T14:31:51Z

AdirthaBorgohain
Jun 14, 2023

I want to know if there's any way to train relation extraction models on top of spans predicted by span categorizer. As far as I understand, for training relation extraction model, we need NER entities already labeled. But since the NER component doesn't work with overlapping spans, I was wondering if it's possible to train with span categorizer instead?

For eg. say I have a sentence: "Injury to Roger Federer prevented him from winning the French Open."
I want to extract the following entities/spans: Roger Federer [Person], Injury to Roger Federer [Event], French Open [Competition]
Also, I want to extract the following relations: Injury to Roger Federer --CAUSE_OF_LOSS--> French Open

Any ideas for this would be great!

Answered by kadarakos

Jun 15, 2023

Hey AdirthaBorgohain,

I think what you are suggesting makes sense. The tutorial implementation is currently not flexible enough to work on any doc.spans, but requires doc.ents to be set: https://github.com/explosion/projects/blob/v3/tutorials/rel_component/scripts/rel_pipe.py#L23. As you mention doc.ents has to be non-overlapping.

So for example the:

import spacy

nlp = spacy.load("en_core_web_lg")
doc = nlp("We visited The Bob's Burgers Museum")
print(doc.ents)

This prints (The Bob's Burgers Museum,). We can add more entities like:

from spacy.tokens import Span

doc.ents += (Span(doc, 0, 1, "test"), )
print(doc.ents)

Now we have (We, The Bob's Burgers Museum). But when we try to add "Bob…

View full answer

kadarakos · 2023-06-15T08:32:15Z

kadarakos
Jun 15, 2023

Hey AdirthaBorgohain,

I think what you are suggesting makes sense. The tutorial implementation is currently not flexible enough to work on any doc.spans, but requires doc.ents to be set: https://github.com/explosion/projects/blob/v3/tutorials/rel_component/scripts/rel_pipe.py#L23. As you mention doc.ents has to be non-overlapping.

So for example the:

import spacy

nlp = spacy.load("en_core_web_lg")
doc = nlp("We visited The Bob's Burgers Museum")
print(doc.ents)

This prints (The Bob's Burgers Museum,). We can add more entities like:

from spacy.tokens import Span

doc.ents += (Span(doc, 0, 1, "test"), )
print(doc.ents)

Now we have (We, The Bob's Burgers Museum). But when we try to add "Bob's Burgers" as a show:

doc.ents += (Span(doc, 3, 6, "SHOW"), )

We get the error:

ValueError: [E1010] Unable to set entity information for token 3 which is included
in more than one span in entities, blocked, missing or outside.

What I would suggest is to try implement your own get_instances function: https://github.com/explosion/projects/blob/v3/tutorials/rel_component/scripts/rel_model.py#L28.

As you see the current implementation has:

@spacy.registry.misc("rel_instance_generator.v1")
def create_instances(max_length: int) -> Callable[[Doc], List[Tuple[Span, Span]]]:
    def get_instances(doc: Doc) -> List[Tuple[Span, Span]]:
        instances = []
        for ent1 in doc.ents:
            for ent2 in doc.ents:
                if ent1 != ent2:
                    if max_length and abs(ent2.start - ent1.start) <= max_length:
                        instances.append((ent1, ent2))
        return instances

    return get_instances

I think you don't need to change more than:

@spacy.registry.misc("rel_span_instance_generator.v1")
def create_instances(max_length: int, span_key: str) -> Callable[[Doc], List[Tuple[Span, Span]]]:
   def get_instances(doc: Doc) -> List[Tuple[Span, Span]]:
       instances = []
       for ent1 in doc.spans[span_key]:
           for ent2 in doc.spans[span_key]:
               if ent1 != ent2:
                   if max_length and abs(ent2.start - ent1.start) <= max_length:
                       instances.append((ent1, ent2))
       return instances

   return get_instances

The new function allows you to retrieve spans from doc.spans[span_key]. You will also need to create a new factory that does not require the doc.ents to be set: https://github.com/explosion/projects/blob/v3/tutorials/rel_component/scripts/rel_pipe.py#L23.

To me it seems like there are no other modifications required.

Kinda interesting what you are up to, please let us know if you have further questions, but I would also like to know if this worked!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training a relation extraction model with span categorization instead of NER #12725

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Training a relation extraction model with span categorization instead of NER #12725

Uh oh!

AdirthaBorgohain Jun 14, 2023

Replies: 1 comment

Uh oh!

kadarakos Jun 15, 2023

AdirthaBorgohain
Jun 14, 2023

kadarakos
Jun 15, 2023