Skip to content
Discussion options

You must be logged in to vote

Does this mean spaCy is also training the word vectors for these pipelines from scratch? I want to take advantage of the pre-trained embeddings for each model, so this would not be ideal.

There are two kinds of vectors in spaCy:

  • word vectors, where one exists for each different word (or with subword features like in floret), which are used as input to a tok2vec
  • tok2vec vectors, which are generated by a CNN tok2vec or transformer layer and used as input for statistical components

The vectors entry in the config, as well as use_static_vectors, refers to word vectors. If those are null, spaCy will just not use word vectors at all, and use other features of tokens as input to the tok2vec/…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@adrianeboyd
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training Training and updating models feat / vectors Feature: Word vectors and similarity
3 participants