Custom word2vec static impact on tok2vec settings #12884
-
Hello, How should the respective settings relate to each other? Is one of the word2vec algo's (CBOT, skip-gram) better suited to spacy training? My base tok2vec settings include the below. My initial word2vec models have a vector_size of 300 and window of 4. I believe vector_size of 300 is consistent with en_web_lg, but where does window come into play. If my documents tend to be verbose, would widening window size potentially improve performance? Is there a way to express this in spacy configs? [components.tok2vec.model.encode]
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Some of these settings have similar names (the underlying concepts are similar), but the word2vec settings for the static word vectors are completely separate from the There are a large number of hyperparameters for word2vec and most of them influence each other, so it's hard to give simple advice. We can mainly recommend evaluating with your downstream task. (There are some similarity-related measures that can be used for intrinsic evaluation of word vectors, but they often don't correlate well with the downstream performance on other types of tasks.) |
Beta Was this translation helpful? Give feedback.
Some of these settings have similar names (the underlying concepts are similar), but the word2vec settings for the static word vectors are completely separate from the
tok2vec
settings. Fortok2vec
, see: https://spacy.io/api/architectures/#tok2vec-archThere are a large number of hyperparameters for word2vec and most of them influence each other, so it's hard to give simple advice. We can mainly recommend evaluating with your downstream task. (There are some similarity-related measures that can be used for intrinsic evaluation of word vectors, but they often don't correlate well with the downstream performance on other types of tasks.)