Skip to content
Discussion options

You must be logged in to vote

This is admittedly kind of confusing, the way the .vector method works could maybe use some more detail in the docs.

Vectors can come from three different places, which are checked in this order:

  1. User hooks
  2. (if no vectors) Doc.tensor (if available)
  3. A vector lookup table

You can see this in the source for the method, which is pretty succinct.

What's happening is in the small model there is no vector table, so the vector representation comes from Doc.tensor, which is set by tok2vec. This uses a CNN with a small window, so neighboring tokens can affect the representation of an individual token. If you make a long sentence and just change the early words you can see the later words are unaf…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@victorconan
Comment options

@polm
Comment options

@victorconan
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / vectors Feature: Word vectors and similarity feat / tok2vec Feature: Token-to-vector layer and pretraining
3 participants
Converted from issue

This discussion was converted from issue #7639 on April 02, 2021 09:20.