After to_bytes without vocab, and from_bytes, lang_ is None in Doc objects #4390
-
How to reproduce the behaviourThe following was written to not have to wait 15 seconds on each (where That works great, speeds up test startup a lot ... but then it turns out that the This is because But the language is a property of the language model itself, not just of its vocabulary (though of course they ought to match). So I don't think Info about spaCy
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
That's correct, yes. In your code, you'll still have a The I'm assuming you don't want to load the lang_cls = spacy.util.get_lang_class(lang)
lang_model = lang_cls().from_bytes(tokenizer_file.read())You might also want to choose a different name and not call it |
Beta Was this translation helpful? Give feedback.
-
|
I think this has been addressed by Ines' explanations and suggestions? If not - feel free to open a new issue! |
Beta Was this translation helpful? Give feedback.
That's correct, yes. In your code, you'll still have a
Vocabbtw – theLanguageclass initializes this automatically. It's just that yourVocabis blank and doesn't have a language assigned.The
meta["lang"]setting exists so that you can create an instance of the sameLanguagesubclass – e.g. viautil.get_lang_class(meta["lang"]). This is also how spaCy does it under the hood when you load a model.I'm assuming you don't want to load the
vocabbecause or the word vectors? The following shouldn't be slower than what you currently have: