Coreference Training for Biomedical Text. #13160
Replies: 3 comments 5 replies
-
The Training the span resolver is one of the later steps, are you running the |
Beta Was this translation helpful? Give feedback.
-
Yeah, I am following the training steps outlined in the
I concur that a dataset of 30 examples is insufficient. However, before we embark on creating a larger training set, it's crucial to first confirm that we can successfully train the model. The creation of coreference labeling is quite time-consuming, so ensuring our approach is effective beforehand is important. Here is full output of training steps:
|
Beta Was this translation helpful? Give feedback.
-
Well, running coref_clusters
It sounds like I at least have some values in the COREF_R part. I assume that to achieve proper training, I need to create a larger dataset for training. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am attempting to train coreferences from scratch for biomedical content. To date, I have created approximately 30 annotated coreference examples using Prodigy for training purposes.
Below is the command sequence I utilized for Prodigy:
For training, the only resource I have referenced so far is this guide.
In their example, they used "OntoNotes" for preprocessing and creating spaCy format. However, I don't require this step as I can create the spaCy format directly from the Prodigy annotations using "data-to-spacy". Consequently, I have excluded the "preprocess" workflow from my project. Here is the command sequence for training:
I am puzzled as to why there is only one epoch displayed. Additionally, the model does not seem to provide predictions. Could you offer any insights or suggestions?
Beta Was this translation helpful? Give feedback.
All reactions