Low recall while training NER pipeline #9379
Replies: 1 comment
-
Are you using all internal links in Wikipedia, not a subset? That's going to be hard because Wikipedia mentions don't always look like normal named entities; often they don't look like anything at all - you could have a sentence like To take a real example picked at random:
Can you guess which words are links? Answer below.
"rotunda" and "gatehouse" (not "bridge").
Also see the actual article.
It's a small problem. It gives the model less information to work with; if you have multiple labels, their relationships can be hints to the model, and you don't have that. But if one label correctly describes your problem it's hard to improve it. I think you may just have a hard problem here, especially for recall, if you're actually trying to reproduce all Wikipedia internal links. It might be worth thinking about whether you can narrow your problem down a little bit. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am training a NER pipeline. My datataset consist of Wikipedia pages and entities are internal links. I use config (optimised for accuracy) generated here. Train dataset contains 50K articles (split by paragraphs) and dev dataset has 15K articles. There is almost 78K overlapping entities between train and dev. Train dataset has 290K unique entities and dev has about 110K unique entities. Batch size is between 50-100 paragraphs.
My training is right now on 13th epoch and I get following results: Precision 76.68/Recall 61.92. and it's improving very slowly. So I wonder what parameters should I tweak? Do you have any recommendations? I am thinking about using wand for searching right hyperparameters but its very costly for resources so I wanted ask you first maybe you can see some problem here. 🤔
I should also mention that I use only one label for all entities. Is it a big problem? How many epochs you train your NER pipeline on?
Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions