Difference in evaluation metric while evaluating NER + RELATIONAL model #9808
-
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
By "very little", are you referring to the +/- 85% F-scores that are being printed for the best threshold cutoff? Because it looks to me like your model is in fact properly trained (just compare it to the baseline). If there is a discrepancy with the numbers reported during training, could you paste the output log of that as well? |
Beta Was this translation helpful? Give feedback.
-
I think this question has been asked and answered to you several times before. In short: either improve the training dataset, or write a custom rule-based component to remove predictions that are non-sensical. |
Beta Was this translation helpful? Give feedback.
By "very little", are you referring to the +/- 85% F-scores that are being printed for the best threshold cutoff? Because it looks to me like your model is in fact properly trained (just compare it to the baseline). If there is a discrepancy with the numbers reported during training, could you paste the output log of that as well?