Skip to content
Discussion options

You must be logged in to vote

I'd recommend this section of the docs: https://spacy.io/usage/spacy-101#vocab

And there's a graphical overview here: https://spacy.io/api

Here's an explanation I wrote on stackoverflow for a similar question (https://stackoverflow.com/a/68889010):

There's no real "vocab" count in spaCy v2.3 or v3. You should mainly think of nlp.vocab and nlp.vocab.strings as caches where the total count isn't a meaningful value. The nlp.vocab Vocab is not static and grows as you process texts with the pipeline.

The vocab is a cache of Lexeme objects and the nlp.vocab.strings StringStore is a cache of string hashes. The vocab contains lexemes for tokens that have been seen before in some text that has be…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@sonynavdeep81
Comment options

Answer selected by sonynavdeep81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / vectors Feature: Word vectors and similarity
2 participants