How are KnowledgeBase entity_vectors used? #9095
-
When creating a KnowledgeBase, how does the In the nel_emerson tutorial, the entity vectors are derived from the description of the entities (e.g., Australian tennis player) rather than the entity's name (e.g., Roy Stanley Emerson). Would it be incorrect to use the name or id for the vector instead of the description? Or, what if there are two Australian tennis players with a last name "Emerson". I assume it wouldn't work to use the same entity vectors for both, based on the description? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
In a nutshell: the entity vectors are compared to sentence embeddings, and the more similar an entity vector is to a sentence embedding, the more likely the Entity Linker will deem the link between the mention in that sentence, and the ID corresponding to the entity (vector). During training, the Entity Linker learns an embedding model that minimizes the distance between sentence embeddings and entity vectors of gold-standard links. If you have two people called "Emerson" and you'd give them the same entity vector, the EL will effectively not be able to distinguish between the two. It might still create correct predictions based on the aliases that you add for each to the KB, because the EL works in two steps:
Does that clarify things? |
Beta Was this translation helpful? Give feedback.
In a nutshell: the entity vectors are compared to sentence embeddings, and the more similar an entity vector is to a sentence embedding, the more likely the Entity Linker will deem the link between the mention in that sentence, and the ID corresponding to the entity (vector). During training, the Entity Linker learns an embedding model that minimizes the distance between sentence embeddings and entity vectors of gold-standard links.
If you have two people called "Emerson" and you'd give them the same entity vector, the EL will effectively not be able to distinguish between the two. It might still create correct predictions based on the aliases that you add for each to the KB, because the …