Train a relation extraction model with spaCy #10930

CarlaMPR · 2022-06-08T17:34:05Z

CarlaMPR
Jun 8, 2022

Hi! :) I'm working on Relation Extraction, specifically the extraction of drug-drug interactions from text documents.
I saw this tutorial, https://www.youtube.com/watch?v=8HL-Ap5_Axo, and I'm trying to implement that in my specific case. I'm using this corpus, https://github.com/isegura/DDICorpus, consisting of a collection of XML files with the drugs and the relations between them already labelled. With this, I had two doubts and that's why I decided to send you this message.

Do I need a NER model to identify the drugs (my entities) in the first place?
In what format should I have the corpus? I saw this example, https://github.com/explosion/projects/blob/v3/tutorials/rel_component/assets/annotations.jsonl, but I couldn't understand what "token_start" and "token_end" means.
Thank You!

Answered by polm

Jun 9, 2022

Do I need a NER model to identify the drugs (my entities) in the first place?

Yes, the relex component uses existing entity annotations to find potential relations, so you need something that sets entities. Usually that would be an NER model, though it could be rule-based (an EntityRuler).

In what format should I have the corpus? I saw this example,

You need to make a .spacy file that has Docs that look like you want your output to look - they should have the right entities and the right relation data in the doc._.rel attribute. The preprocessing script in the demo project builds a file like that (it's the DocBin). In this data token_start and token_end are the token indices of the te…

View full answer

polm · 2022-06-09T03:42:05Z

polm
Jun 9, 2022

Do I need a NER model to identify the drugs (my entities) in the first place?

Yes, the relex component uses existing entity annotations to find potential relations, so you need something that sets entities. Usually that would be an NER model, though it could be rule-based (an EntityRuler).

In what format should I have the corpus? I saw this example,

You need to make a .spacy file that has Docs that look like you want your output to look - they should have the right entities and the right relation data in the doc._.rel attribute. The preprocessing script in the demo project builds a file like that (it's the DocBin). In this data token_start and token_end are the token indices of the terms, as opposed to the character indices (to token_start 5 is the fifth token, etc.).

2 replies

CarlaMPR Jun 9, 2022
Author

Thank you!!! :) Is there any example of a .spacy file? I'm not understanding very well.

polm Jun 9, 2022

This script creates a .spacy file when it calls docbin.to_disk.

You can read more about training data in general here.

KorneliaBastin · 2022-07-31T20:28:09Z

KorneliaBastin
Jul 31, 2022

Is there a way to just train the model for the selected relations or entities that are included in the training set? Or do we always have to train it for all the entities and relations? For example if I annotate documents for category x and y and relationship i and j can I then just train the model for relationship i and check the accuracy just for i? Or can I train the model for i and j and then check the accuracy for each category separately?

1 reply

polm Aug 15, 2022

Sorry for the delayed reply to this.

By default, the relation extraction model learns all the relation labels in the training set, and when predicting will consider all pairs of entities within range of each other. You can make it ignore some entity types by modifying the get_candidates function, and you can customize the scorer to only check certain relationship types or check them separately.

That said, I'm not sure why you'd train on some relationship types and then not score them at all - if your'e not interested in some it's probably best to filter the training data to remove those.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Train a relation extraction model with spaCy #10930

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Train a relation extraction model with spaCy #10930

Uh oh!

CarlaMPR Jun 8, 2022

Replies: 2 comments · 3 replies

Uh oh!

polm Jun 9, 2022

Uh oh!

CarlaMPR Jun 9, 2022 Author

Uh oh!

polm Jun 9, 2022

Uh oh!

KorneliaBastin Jul 31, 2022

Uh oh!

polm Aug 15, 2022

CarlaMPR
Jun 8, 2022

Replies: 2 comments 3 replies

polm
Jun 9, 2022

CarlaMPR Jun 9, 2022
Author

KorneliaBastin
Jul 31, 2022