Entity linker candidate score calculation #13793

billziss-gh · 2025-04-08T10:57:11Z

billziss-gh
Apr 8, 2025

I am trying to understand how the entity linker computes candidate scores in order to determine which candidate to return. I am finding that first a cosine similarity between the candidate vector and the sentence is computed and then the similarity is combined with the prior probability using the following formula:

scores = prior_probs + sims - (prior_probs * sims)

I am trying to understand this formula and the logical justification behind it, because it is not clear to me. On the surface it looks like a probabilistic-OR:

$$ \begin{align*} P(A \lor B) &= P(A) + P(B) - P(A \land B) \\ &= P(A) + P(B) - P(A) \cdot P(B) \quad\text{ when A and B independent} \end{align*} $$

Except that cosine similarities are not probabilities (most notably they are in the interval $[-1,+1]$).

Can someone explain the justification behind the formula? Is it a heuristic that just works, is it a probabilistic-OR or something else?

weezymatt · 2025-07-03T21:47:32Z

weezymatt
Jul 3, 2025

Hi @billziss-gh

The objective scoring function is inspired from the paper Entity Linking via Joint Encoding of Types, Descriptions, and Context Section 4, Equation 2. I'll quote the authors explanation/equation below so you don't have to click the link:

We treat these two pieces of evidence; pre-computed prior probability, and the context-based probability, as
independent, disjunctive sources of signal, and thus combine them to compute P(e|m) as:

The explanation supports the scoring function is probabilistic as the mention context encoder computes the context-based probability, or at least can be interpreted as such because of the softmax function.

Being aware of the original formula now, the cosine similarity is not probabilistic by nature and would require additional work to convert it. This leads me to believe the scoring function in spaCy is a heuristic that is influenced by the general addition rule/probabilistic-OR and happens to work.

I quickly put together a test batch to partially support this:

event_one = 0.5 + 0.7 - (0.5 * 0.7) # 0.85

event_two = 0.2 + 0.1 - (0.2 * 0.1) # 0.28

event_three = 0.3 + (-0.7) - (0.3 * -0.7) # -0.189

event_four = 0.1 + (0.9) - (0.3 * 0.9) # 0.73

This small test suggests the cosine similarity dominates the final value when relatively high in the formula and has sensible output.

[EXTRA] There is also another paper referenced in the paper above, titled Robust Disambiguation of Named Entities in Text (see Section 3, Overall Objective Function) which instead uses linear interpolation such that α, β, and γ sum to 1. This variant is shown below:

This variant uses a convex combination/weighted sum that includes prior probability, cosine similarity, and a coherence measure.

That is to say, the variant in spaCy appears to be a heuristic function with probabilistic influences that appears to work as intended. Hope this explanation helps!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Entity linker candidate score calculation #13793

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Entity linker candidate score calculation #13793

Uh oh!

billziss-gh Apr 8, 2025

Replies: 1 comment

Uh oh!

Uh oh!

weezymatt Jul 3, 2025

billziss-gh
Apr 8, 2025

weezymatt
Jul 3, 2025