Entity linker candidate score calculation #13793
Replies: 1 comment
-
Hi @billziss-gh The objective scoring function is inspired from the paper Entity Linking via Joint Encoding of Types, Descriptions, and Context Section 4, Equation 2. I'll quote the authors explanation/equation below so you don't have to click the link:
![]() The explanation supports the scoring function is probabilistic as the mention context encoder computes the context-based probability, or at least can be interpreted as such because of the softmax function. Being aware of the original formula now, the cosine similarity is not probabilistic by nature and would require additional work to convert it. This leads me to believe the scoring function in spaCy is a heuristic that is influenced by the general addition rule/probabilistic-OR and happens to work. I quickly put together a test batch to partially support this: event_one = 0.5 + 0.7 - (0.5 * 0.7) # 0.85
event_two = 0.2 + 0.1 - (0.2 * 0.1) # 0.28
event_three = 0.3 + (-0.7) - (0.3 * -0.7) # -0.189
event_four = 0.1 + (0.9) - (0.3 * 0.9) # 0.73 This small test suggests the cosine similarity dominates the final value when relatively high in the formula and has sensible output. [EXTRA] There is also another paper referenced in the paper above, titled Robust Disambiguation of Named Entities in Text (see Section 3, Overall Objective Function) which instead uses linear interpolation such that α, β, and γ sum to 1. This variant is shown below: ![]() This variant uses a convex combination/weighted sum that includes prior probability, cosine similarity, and a coherence measure. That is to say, the variant in spaCy appears to be a heuristic function with probabilistic influences that appears to work as intended. Hope this explanation helps! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to understand how the entity linker computes candidate scores in order to determine which candidate to return. I am finding that first a cosine similarity between the candidate vector and the sentence is computed and then the similarity is combined with the prior probability using the following formula:
I am trying to understand this formula and the logical justification behind it, because it is not clear to me. On the surface it looks like a probabilistic-OR:
Except that cosine similarities are not probabilities (most notably they are in the interval$[-1,+1]$ ).
Can someone explain the justification behind the formula? Is it a heuristic that just works, is it a probabilistic-OR or something else?
Beta Was this translation helpful? Give feedback.
All reactions