Use kb_id
s in doc as training data for Entity Linker
#9308
Replies: 2 comments 3 replies
-
Sorry you're having trouble with this. It sounds like the annotations may not be getting picked up for some reason. The Entity Linker code is in pure Python, so you should be able to debug it in place to figure out what's going on. If you look at this loop in the source you should be able to figure out what's going on - in particular, you should be able to see if
Your model needs to have a way to generate candidates at inference time, right? Even if you just have a fixed list of candidates you want it to pick from, you need a function to provide that list. If you just use the annotated kb_ids your model will learn to just pick the ID you gave it, which is not meaningful behavior. I'm a little surprised that changing the candidate generator has no effect on training. |
Beta Was this translation helpful? Give feedback.
-
Hi @alfredomg , were you able to find the root cause? I am getting the same issue, exact results as you are facing |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm trying to train a custom EL based on a trained custom NER using the new Spacy 3 config file system. The docs in my training DocBin file have been manually annotated with entities and their
kb_id_
's as per my KB. However, when attempting to train EL I get the following warning:And there seems to be no training going on at all as EL scores stay at 0.
The config file I'm using is somewhat similar to the example one at https://github.com/explosion/projects/blob/v3/tutorials/nel_emerson/configs/nel.cfg
I think the problem boils down to the
get_candidates
function in the[components.entity_linker]
option. I tried using the defaultget_candidates = {"@misc":"spacy.CandidateGenerator.v1"}
, removing it altogether, as well as specifying my own custom registered function. They all resulted in the same behaviour.Is there a way to force the EL trainer to just use the
kb_id
s that are annotated in the DocBin docs? Notice that I don't really need to use any "candidate generators" as I already know the correct kb_id for each entity in my training set.FYI, this is the custom registered function I used:
I don't like this implementation, as it should instead generate a list of
Candidate
s directly. But as I said, I shouldn't even need to do that as I already know the correct kb_id, i.e. there will be one and only one "candidate" kb_id for every entity in my training set.And this is the config file I used (using the above registered function):
And this is the training command:
Thanks in advance for your help!
Alfredo
Beta Was this translation helpful? Give feedback.
All reactions