Entity linker queries of aliases and entity descriptions #6780
-
Hi: I have established a working entity linking tool using WikiData/Wikipedia. I am interested in added the known aliases and entity descriptions to the structured output I am generating. However, I can't seem to find a function to query an entity ID to get description or a list of aliases. Does this function exist? Thanks so much, in advance. Kindest regards, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi! Because the Knowledge Base can potentially grow quite big, we've opted to make it as efficient as possible for the retrieval task required by the Entity Linker - which is retrieving possible KB ID's, given a certain alias/mention. Unfortunately this also means that querying the other way around (ID -> set of aliases) will be inefficient in the current implementation. To add insult to injury, the description is never stored in its original string form, only the vectorized form of it. Again, because of efficiency reasons in the core implementation. One solution for your use-case could be to create a subclass of If you used the example Wikipedia/Wikidata scripts that we provided (https://github.com/explosion/projects/tree/master/nel-wikipedia), this script actually stores some intermediate processing files to disk, which contain the information you need. So you could also query your information directly from these files. The title/name of an entity:
The description of an entity:
And the aliases of a given entity:
Hope that works for you! |
Beta Was this translation helpful? Give feedback.
Hi!
Because the Knowledge Base can potentially grow quite big, we've opted to make it as efficient as possible for the retrieval task required by the Entity Linker - which is retrieving possible KB ID's, given a certain alias/mention. Unfortunately this also means that querying the other way around (ID -> set of aliases) will be inefficient in the current implementation. To add insult to injury, the description is never stored in its original string form, only the vectorized form of it. Again, because of efficiency reasons in the core implementation.
One solution for your use-case could be to create a subclass of
KnowledgeBase
that reimplements theadd_entity
,set_entities
,add_alias
anda…