Skip to content
Discussion options

You must be logged in to vote

The vocab is actually more of a cache than a store of known tokens, so it's mostly empty until you start processing texts.

In this case, since you're doing similarity calculations that only make sense for words with vectors, iterate over the words with vectors instead:

lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Pandalei97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / pipeline Feature: Processing pipeline and components
2 participants