Skip to content
Discussion options

You must be logged in to vote

This line is the problem:

vocab = Vocab().from_bytes(jsonpickle.decode(vocab_bytes))

This is not creating the same English vocab as from the "en" pipeline, but a vocab for an unspecified language with no defaults for things like is_left_punct. To restore the same vocab, you want:

vocab = spacy.blank("en").vocab.from_bytes(...) # or English().vocab ...

If you're using en_core_web_trf and not customizing any lexical features (cluster, norm, prob, and sentiment are the main ones that would be saved with the vocab), you don't really need to save the vocab at all. spacy.blank("en").vocab will have the same lexical attributes asen_core_web_trf.vocab. Do be aware that the lexical attributes can …

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by polm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / vectors Feature: Word vectors and similarity feat / serialize Feature: Serialization, saving and loading
3 participants
Converted from issue

This discussion was converted from issue #9523 on October 22, 2021 06:13.