What do static vectors do during training? #8709

BramVanroy · 2021-07-13T18:51:31Z

BramVanroy
Jul 13, 2021

I understand that you can add static vectors to a trainable component, e.g.

vectors = "path/to/vectors"

[components.ner.model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = 96
include_static_vectors = true

So my question is, what do these vectors do, or how are they used during training? Are the static vectors passed as additional input features for the NER model alongside the word? The same question goes for when we use include_static_vectors = true on a Tok2Vec layer.

So I am trying to understand the interplay between static vectors and other components. An additional example: here we read that similarity scores can still be calculated by using smaller models without static vectors. That raise the question: which component sets the vectors that are responsible for the similarity?

Answered by polm

Jul 14, 2021

So my question is, what do these vectors do, or how are they used during training?

Static vectors are used for one of the pretraining objectives called "LMAO" where the model attempts to reproduce the vectors. Also as indicated in the static vectors docs they can be used as a feature in the tok2vec representation.

That raise the question: which component sets the vectors that are responsible for the similarity?

As indicated in the docs similarity uses an average of word vectors by default (the non-default cause is a user hook to override this). The vector value is just Doc.vector (or Span or Token). That value uses word vectors if available, and if not (as in the small models), falls …

View full answer

polm · 2021-07-14T04:14:06Z

polm
Jul 14, 2021

So my question is, what do these vectors do, or how are they used during training?

Static vectors are used for one of the pretraining objectives called "LMAO" where the model attempts to reproduce the vectors. Also as indicated in the static vectors docs they can be used as a feature in the tok2vec representation.

That raise the question: which component sets the vectors that are responsible for the similarity?

As indicated in the docs similarity uses an average of word vectors by default (the non-default cause is a user hook to override this). The vector value is just Doc.vector (or Span or Token). That value uses word vectors if available, and if not (as in the small models), falls back to the tok2vec representation, which is not tuned for similarity.

The implementations of each of these functions is quite small, if you're curious I would recommend looking at the source.

4 replies

BramVanroy Jul 14, 2021
Author

Alright, so after some digging I found what I was looking for and it is indeed a simple concatenation exercise.

spaCy/spacy/ml/models/tok2vec.py

Lines 161 to 173 in 8a95475

    
           if include_static_vectors: 
        
               model = chain( 
        
                   concatenate( 
        
                       chain( 
        
                           FeatureExtractor(attrs), 
        
                           list2ragged(), 
        
                           with_array(concatenate(*embeddings)), 
        
                       ), 
        
                       StaticVectors(width, dropout=0.0), 
        
                   ), 
        
                   with_array(Maxout(width, concat_size, nP=3, dropout=0.0, normalize=True)), 
        
                   ragged2list(), 
        
               )

@polm So if I understand correctly, if static vectors are included then each Token will have a vector representation based on that context-insensitive table. If such table is not present, then the context-sensitive output of the Tok2Vec component will be used to set Token.vector - assuming that such a component exists. Can you also, then, get the context-sensitive vector representation even if static vectors are available? Context-insensitive are useful for similarity, but context-sensitive representations are useful for other tasks, too.

polm Jul 14, 2021

Can you also, then, get the context-sensitive vector representation even if static vectors are available?

Yes, it's just the mean of the tensor attribute.

https://github.com/explosion/spaCy/blob/2a8eeed5da8bcc169f6c24459f67f49059c77181/spacy/tokens/doc.pyx#L632,L634

adrianeboyd Jul 14, 2021

The standalone tok2vec component sets doc.tensor, the built-in tok2vec layers do not, so you end up with the tensor from the last tok2vec in the pipeline.

BramVanroy Jul 14, 2021
Author

@polm I think you misunderstood. My question was: can you get both the context-sensitive representation (from the last tok2vec/tranformers layer as @adrianeboyd says) and the context-insensitive representation (from static vectors, if part of pipeline), but I have figured it out with your help.

spaCy/spacy/tokens/token.pyx

Lines 399 to 413 in 2a8eeed

    
               @property 
        
               def vector(self): 
        
                   """A real-valued meaning representation. 
        
                   RETURNS (numpy.ndarray[ndim=1, dtype='float32']): A 1D numpy array 
        
                       representing the token's semantics. 
        
                   DOCS: https://spacy.io/api/token#vector 
        
                   """ 
        
                   if "vector" in self.doc.user_token_hooks: 
        
                       return self.doc.user_token_hooks["vector"](self) 
        
                   if self.vocab.vectors.size == 0 and self.doc.tensor.size != 0: 
        
                       return self.doc.tensor[self.i] 
        
                   else: 
        
                       return self.vocab.get_vector(self.c.lex.orth)

CMIIW, I think these are the if-statements:

Get the vector from a user_defined_hook if such a hook is implemented
Get slice from doc vector if no static vectors exist in Vocab (the same Token.tensor)
Get static vector from Vocab store if it exists

So as you said, and in summary for myself when I forgot in the future:

Token.vector is the context-insensitive representation from static vector if available (e.g. in larger models)
if static vectors are not available (smaller models), then Token.vector == Token.tensor
Token.similarity uses Token.vector, so if static vectors are not available similarity will be calculated on Token.tensor (context-sensitive) which may give unexpected results
if you really want to you can customize similarity calculations in a user hook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What do static vectors do during training? #8709

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What do static vectors do during training? #8709

Uh oh!

BramVanroy Jul 13, 2021

Replies: 1 comment · 4 replies

Uh oh!

polm Jul 14, 2021

Uh oh!

BramVanroy Jul 14, 2021 Author

Uh oh!

polm Jul 14, 2021

Uh oh!

Uh oh!

adrianeboyd Jul 14, 2021

Uh oh!

BramVanroy Jul 14, 2021 Author

BramVanroy
Jul 13, 2021

Replies: 1 comment 4 replies

polm
Jul 14, 2021

BramVanroy Jul 14, 2021
Author

BramVanroy Jul 14, 2021
Author