Skip to content
Discussion options

You must be logged in to vote

That's a feature, not a bug 😉.

spaCy allows you to prune the vectors in a vocabulary to help keep models lightweight. You can also confirm from inspecting the model card that the medium English model comes with 685K keys in the vocabulary but only 20K unique vectors.

If you were to check the en_core_web_lg model then these vectors should be different.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by adrianeboyd
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models Issues related to the statistical models feat / vectors Feature: Word vectors and similarity
2 participants
Converted from issue

This discussion was converted from issue #10985 on June 20, 2022 07:16.