Training a relation extraction model with span categorization instead of NER #12725
-
I want to know if there's any way to train relation extraction models on top of spans predicted by span categorizer. As far as I understand, for training relation extraction model, we need NER entities already labeled. But since the NER component doesn't work with overlapping spans, I was wondering if it's possible to train with span categorizer instead? For eg. say I have a sentence: "Injury to Roger Federer prevented him from winning the French Open." Any ideas for this would be great! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey AdirthaBorgohain, I think what you are suggesting makes sense. The tutorial implementation is currently not flexible enough to work on any So for example the: import spacy
nlp = spacy.load("en_core_web_lg")
doc = nlp("We visited The Bob's Burgers Museum")
print(doc.ents) This prints from spacy.tokens import Span
doc.ents += (Span(doc, 0, 1, "test"), )
print(doc.ents) Now we have doc.ents += (Span(doc, 3, 6, "SHOW"), ) We get the error: ValueError: [E1010] Unable to set entity information for token 3 which is included
in more than one span in entities, blocked, missing or outside. What I would suggest is to try implement your own As you see the current implementation has: @spacy.registry.misc("rel_instance_generator.v1")
def create_instances(max_length: int) -> Callable[[Doc], List[Tuple[Span, Span]]]:
def get_instances(doc: Doc) -> List[Tuple[Span, Span]]:
instances = []
for ent1 in doc.ents:
for ent2 in doc.ents:
if ent1 != ent2:
if max_length and abs(ent2.start - ent1.start) <= max_length:
instances.append((ent1, ent2))
return instances
return get_instances I think you don't need to change more than: @spacy.registry.misc("rel_span_instance_generator.v1")
def create_instances(max_length: int, span_key: str) -> Callable[[Doc], List[Tuple[Span, Span]]]:
def get_instances(doc: Doc) -> List[Tuple[Span, Span]]:
instances = []
for ent1 in doc.spans[span_key]:
for ent2 in doc.spans[span_key]:
if ent1 != ent2:
if max_length and abs(ent2.start - ent1.start) <= max_length:
instances.append((ent1, ent2))
return instances
return get_instances The new function allows you to retrieve spans from To me it seems like there are no other modifications required. Kinda interesting what you are up to, please let us know if you have further questions, but I would also like to know if this worked! |
Beta Was this translation helpful? Give feedback.
Hey AdirthaBorgohain,
I think what you are suggesting makes sense. The tutorial implementation is currently not flexible enough to work on any
doc.spans
, but requiresdoc.ents
to be set: https://github.com/explosion/projects/blob/v3/tutorials/rel_component/scripts/rel_pipe.py#L23. As you mentiondoc.ents
has to be non-overlapping.So for example the:
This prints
(The Bob's Burgers Museum,)
. We can add more entities like:Now we have
(We, The Bob's Burgers Museum)
. But when we try to add "Bob…