Skip to content
Discussion options

You must be logged in to vote

If you use the en_core_web_lg vectors I think you should get the same (or extremely similar) results for spacy v2.2-v3.3.

In the en_core_web_md models, the vectors are pruned so that multiple words get clustered together with the same vector. The pruning step isn't deterministic, so each version of en_core_web_md may have slightly different clusters and vectors.

From your results, it looks like above and front ended up in the same cluster in v3.3.0 but not in v2.2.5 (model versions that you can see with pip freeze or spacy validate, not the spacy version). There were also some minor changes related to the vector deduplication in v3.3.0 that affected the English vectors in particular (#10551

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by zkytony
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / vectors Feature: Word vectors and similarity reproducibility Consistency, reproducibility, determinism, and randomness
2 participants
Converted from issue

This discussion was converted from issue #10903 on June 03, 2022 07:15.