spacy.load doesn't update vocab #8542
-
Hello, Given the test code: nlp1 = spacy.blank("fr")
nlp2 = spacy.load("fr_core_news_md")
nlp3 = spacy.load("fr_core_news_md", vocab = True)
print("Size of nlp1.vocab: ", len(nlp1.vocab))
print("Size of nlp2.vocab: ", len(nlp3.vocab))
print("Size of nlp3.vocab: ", len(nlp3.vocab)) We can see that the vocab of the nlp objects are not really updated. The update only appears in My code to find the most similar words: def sort_by_similarity(word):
by_similarity = sorted(filter(lambda t: t.vector_norm > 0.0, nlp.vocab), key=lambda w: nlp.vocab[word].similarity(w), reverse=True) Is it a bug or I'm not doing right ? [Environment information] Thanks for your response. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The vocab is actually more of a cache than a store of known tokens, so it's mostly empty until you start processing texts. In this case, since you're doing similarity calculations that only make sense for words with vectors, iterate over the words with vectors instead: lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors] |
Beta Was this translation helpful? Give feedback.
The vocab is actually more of a cache than a store of known tokens, so it's mostly empty until you start processing texts.
In this case, since you're doing similarity calculations that only make sense for words with vectors, iterate over the words with vectors instead: