Entity Linker - example config #12324
-
The spacy quickstart config generators don't include any examples of entity linking. Does anyone have an example or recommended way to create entity linking configs for training? I can't seem to find any reference or examples, but I did see somewhere spacy staff mentioning to use the config for training models instead of manually training them. Any help would be much appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi! We should definitely document this better and have this more accessible. What could be useful to you is this example project: https://github.com/explosion/projects/tree/v3/tutorials/nel_emerson. It shows how to train & run an |
Beta Was this translation helpful? Give feedback.
-
@svlandeg That's very helpful thank you! I ended up generating a config for my entity linker model after setting up my data. However I'm not getting an out of memory error with spacy trying to allocate ~18Gi of memory to my GPU when training. Traceback is reproduced below. The line that is causing the GPU OOM is here: https://github.com/explosion/spaCy/blob/master/spacy/pipeline/entity_linker.py#L304 Spacy tries to initialize the entity linker on only 10 examples. The documents that is being passed to this are none longer than 5k characters. Later on in the initalization step it spacy.extract_spans is called on forward pass. This is there where the 18Gi memory allocation occurs. I've reproduced my config below as well. Other posts on spacy memory allocation refer to batch size however, this operation the is OOMing seems to be unrelated to batch size as it has to do with the initialization step of the model itself before training even commences. Note: this also OOMs running on CPU on a machine with 32Gi memory. I'm in the process of stepping through the spacy code to debug but not an expert in the library (yet) so any advice or notes on obvious gotchas are welcome :) Update: I noticed if I change
Config
|
Beta Was this translation helpful? Give feedback.
Hi! We should definitely document this better and have this more accessible. What could be useful to you is this example project: https://github.com/explosion/projects/tree/v3/tutorials/nel_emerson. It shows how to train & run an
entity_linker
on some dummy data, and has an example config in theconfig
subdir. And yes - we definitely recommend using this config for training models as there's a lot of parameters & settings that will be done correctly behind the scenes when usingspacy train
with a config file.