How are KnowledgeBase entity_vectors used? #9095

kinghuang · 2021-08-31T01:33:25Z

kinghuang
Aug 31, 2021

When creating a KnowledgeBase, how does the entity_vector impact a KnowledgeBase or EntityLinker's performance? It's not clear to me if there's some sort of strategy involved in the choice of entity vectors.

In the nel_emerson tutorial, the entity vectors are derived from the description of the entities (e.g., Australian tennis player) rather than the entity's name (e.g., Roy Stanley Emerson). Would it be incorrect to use the name or id for the vector instead of the description? Or, what if there are two Australian tennis players with a last name "Emerson". I assume it wouldn't work to use the same entity vectors for both, based on the description?

Answered by svlandeg

Aug 31, 2021

In a nutshell: the entity vectors are compared to sentence embeddings, and the more similar an entity vector is to a sentence embedding, the more likely the Entity Linker will deem the link between the mention in that sentence, and the ID corresponding to the entity (vector). During training, the Entity Linker learns an embedding model that minimizes the distance between sentence embeddings and entity vectors of gold-standard links.

If you have two people called "Emerson" and you'd give them the same entity vector, the EL will effectively not be able to distinguish between the two. It might still create correct predictions based on the aliases that you add for each to the KB, because the …

View full answer

svlandeg · 2021-08-31T14:10:49Z

svlandeg
Aug 31, 2021

In a nutshell: the entity vectors are compared to sentence embeddings, and the more similar an entity vector is to a sentence embedding, the more likely the Entity Linker will deem the link between the mention in that sentence, and the ID corresponding to the entity (vector). During training, the Entity Linker learns an embedding model that minimizes the distance between sentence embeddings and entity vectors of gold-standard links.

If you have two people called "Emerson" and you'd give them the same entity vector, the EL will effectively not be able to distinguish between the two. It might still create correct predictions based on the aliases that you add for each to the KB, because the EL works in two steps:

generate plausible candidates from the KB, given the mention from text (this depends on the aliases of the entities)
define the candidate with the highest similarity between sentence encoding & entity vector - i.e. disambiguate according to the context in which the mention appears

Does that clarify things?

1 reply

kinghuang Sep 1, 2021
Author

Yes, that helps! I've also been wondering about the sentencizer's role in EntityLinker, and that explains it, too. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How are KnowledgeBase entity_vectors used? #9095

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How are KnowledgeBase entity_vectors used? #9095

Uh oh!

Uh oh!

kinghuang Aug 31, 2021

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

svlandeg Aug 31, 2021

Uh oh!

kinghuang Sep 1, 2021 Author

kinghuang
Aug 31, 2021

Replies: 1 comment 1 reply

svlandeg
Aug 31, 2021

kinghuang Sep 1, 2021
Author