Training the model to NOT recognize certain NE #4865
Replies: 1 comment
-
Hi @Erin59. In my opinion, building a stop-word list is really only feasible if you notice a small variety of words that got wrongly tagged, and only if you're sure that these could never really be entities. Like, there could be a company called "Aye" but maybe that's a sacrifice you're willing to make ;-) However if you see a larger lexical variety, ánd if you feel like there's a sort of systematic error, then it might be worth retraining your NER model. Basically what you want to do is take the sentences where you have mistakes, correct the annotation, and feed the sentence back into the classifier. You won't specifically have an annotation for "NOT AN ENTITY", but the model will (should) learn not to predict entities for those tokens that are not annotated in the "gold" Basically what you are doing then is (re)training or updating the Named Entity Recognizer with custom examples to make it fit your dataset better. While you're doing this, make sure to heed this advice:
Basically, make sure that while retraining, the model does not forget to make those predictions it already had correct originally. If you run into any technical difficulties implementing this retraining loop (if that's the route you chose to take) - feel free to open a new issue! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! Please forgive me if this question was already raised somewhere, I tried searching first but couldn't formulate it properly for Google. Also, sorry if this is not formatted correctly.
My question is - I know that you can train the model to recognize new named entities, however, my problem is the opposite. Is there any way to tell the model to "forget" certain entities that are not recognized correctly and not to recognize them anymore? For example, I was going through some law voting transcripts, there are words "Aye" and "No", and "Aye" gets recognized as a person, even when it's lowcase. Is there any better solution for this apart from just building a stop-word list for such cases?
Beta Was this translation helpful? Give feedback.
All reactions