Would like some explanation on how embedding for words in sentence works #7640
-
I've noticed that the embedding for words in a sentence is different than the embedding for the standalone words. For example,
if we look at a single word:
But if we look at a sentence, and the embedding for the same word becomes different:
if we change the sentence, the embedding for the same word also changes:
Would be great to have some documentations on why the embedding of word in a sentence is different than the one of standalone word. Is there any algorithm spacy applied to adjust the word embedding in a sentence? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hi! This is a great question for the discusssion forum, so I'll move it there. This specific issue will be closed, but you'll get a link/forward to the open thread. |
Beta Was this translation helpful? Give feedback.
-
This is admittedly kind of confusing, the way the Vectors can come from three different places, which are checked in this order:
You can see this in the source for the method, which is pretty succinct. What's happening is in the small model there is no vector table, so the vector representation comes from Does that answer your question? |
Beta Was this translation helpful? Give feedback.
This is admittedly kind of confusing, the way the
.vector
method works could maybe use some more detail in the docs.Vectors can come from three different places, which are checked in this order:
Doc.tensor
(if available)You can see this in the source for the method, which is pretty succinct.
What's happening is in the small model there is no vector table, so the vector representation comes from
Doc.tensor
, which is set bytok2vec
. This uses a CNN with a small window, so neighboring tokens can affect the representation of an individual token. If you make a long sentence and just change the early words you can see the later words are unaf…