spacy.load doesn't update vocab #8542

Pandalei97 · 2021-06-29T14:47:34Z

Pandalei97
Jun 29, 2021

Hello,

Given the test code:

nlp1 = spacy.blank("fr")
nlp2 = spacy.load("fr_core_news_md")
nlp3 = spacy.load("fr_core_news_md", vocab = True)
print("Size of nlp1.vocab: ", len(nlp1.vocab))
print("Size of nlp2.vocab: ", len(nlp3.vocab))
print("Size of nlp3.vocab: ", len(nlp3.vocab))

We can see that the vocab of the nlp objects are not really updated. The update only appears in nlp.vocab.strings and nlp.vocab.vectors. I want to do something like finding the most similar word to the given word by iterating on nlp.vocab, but it only iterates on the 424 'default' lexemes.

My code to find the most similar words:

def sort_by_similarity(word):
    by_similarity = sorted(filter(lambda t: t.vector_norm > 0.0, nlp.vocab), key=lambda w: nlp.vocab[word].similarity(w), reverse=True)

Is it a bug or I'm not doing right ?

[Environment information]
Spacy version: 3.0.6
Python vertion: 3.6

Thanks for your response.

Answered by adrianeboyd

Jun 29, 2021

The vocab is actually more of a cache than a store of known tokens, so it's mostly empty until you start processing texts.

In this case, since you're doing similarity calculations that only make sense for words with vectors, iterate over the words with vectors instead:

lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]

View full answer

adrianeboyd · 2021-06-29T15:25:27Z

adrianeboyd
Jun 29, 2021

The vocab is actually more of a cache than a store of known tokens, so it's mostly empty until you start processing texts.

In this case, since you're doing similarity calculations that only make sense for words with vectors, iterate over the words with vectors instead:

lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

spacy.load doesn't update vocab #8542

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

spacy.load doesn't update vocab #8542

Uh oh!

Pandalei97 Jun 29, 2021

Replies: 1 comment

Uh oh!

adrianeboyd Jun 29, 2021

Pandalei97
Jun 29, 2021

adrianeboyd
Jun 29, 2021