NER Training data with no labels to reduce false positive #11131
-
I am trying to create an NER model to identify one entity. I am facing with the problem of lot of false positives. Because the training data only contains data where the entity is present. During real time, I will get the data where we should not predict any entity. But I am getting lot of false positives on that. How can I reduce the false positive? Can I add data where I should not predict anything in the training data without any labels? Will that work? Is there a way to add negative samples? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
You can pass negatives via a SpanGroup. See the |
Beta Was this translation helpful? Give feedback.
-
One of the golden rules of models is that your training data should be as much like your real data as possible. For NER you should definitely have sentences with no entities (assuming any of your input data will be like that, which is typical). It is also possible to add negative examples, like @kinghuang mentioned, though sometimes it's hard to get the balance right. |
Beta Was this translation helpful? Give feedback.
-
@kinghuang @polm Thanks for the answer. I have one doubt regarding |
Beta Was this translation helpful? Give feedback.
One of the golden rules of models is that your training data should be as much like your real data as possible. For NER you should definitely have sentences with no entities (assuming any of your input data will be like that, which is typical).
It is also possible to add negative examples, like @kinghuang mentioned, though sometimes it's hard to get the balance right.