What's the difference between a tensor and a vector in spaCy 3.0? #6907
-
I've been playing around with the API and it seems like the documents/tokens now also have a import numpy as np
import spacy
nlp = spacy.load("en_core_web_md")
doc = nlp("this is a bit of text")
doc.tensor.shape, doc.vector.shape
# ((6, 96), (300,)) It seems like the tensor has a representation for each token, but why is the dimension different (96 vs. 300). text = doc[-1]
text.tensor.shape, text.vector.shape
# ((96,), (300,)) Looking at the API doc it seems like a tensor is defined as a "Container for dense vector representations." while a vector is "A real-valued meaning representation. Defaults to an average of the token vectors.". So just so I understand, what's the difference between these two? Am I correct to say that spaCy bundles two sets of embeddings in their models? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 9 replies
-
The explanations get a bit fuzzy here because we can define what the thing is "conceptually", but, also, pipelines are allowed to write data to these attributes, and they might choose to use them with different semantics from how we really expect. We use the The |
Beta Was this translation helpful? Give feedback.
The explanations get a bit fuzzy here because we can define what the thing is "conceptually", but, also, pipelines are allowed to write data to these attributes, and they might choose to use them with different semantics from how we really expect.
We use the
doc.tensor
attribute to store the contextual token-to-vector encodings computed by theTok2Vec
component. These encodings might be used as features by other components, if they have aTok2VecListener
layer inside their model. Thedoc.tensor
values may or may not be useful to you outside of those modelling decisions, these are learned parameters and all bets are off, really.The
token.vector
attribute is usually drawn out of the static…