How to add models to a blank nlp pipeline? #11133
-
|
Spacy: 3.3.0 I defined a custom language and add my own syntax_iterator for phrase extraction as below: Then I am trying to use the CustomChinese this way: This seems to work, but it prints out this warning message, which is concerning: Also, I tried to load the models from disk, but it only loads tokenizer: So how to fix my issue in loading models from my blank nlp pipeline object? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 13 replies
-
|
The warning is a serious warning. It means that the sourced components won't work because the vectors are missing. You need to also copy the vectors from If you're doing this in a script then you should be able to just copy the vectors (with v3.2+): nlp.vocab.vectors = source_nlp.vocab.vectors |
Beta Was this translation helpful? Give feedback.
-
|
I got another error: It seems my CustomChinese inheritance didn't work. I used to directly add a syntax_iterator.py into Spacy's source code for the Chinese language and it worked. Now I didn't modify spacy source and this way should be more elegant to add things to the language I want. Can you see my custom_chinese.py definition? It looks right. |
Beta Was this translation helpful? Give feedback.
-
|
If I save my CustomChinese pipleline into a location: Now save it:
Then try to load it back:This doesn't load any models, however. The pipeline is empty except for the tokenizer itself. The nlp.components is an empty list, so no components will be loaded from the model path. |
Beta Was this translation helpful? Give feedback.
-
|
I executed this command at the project root: Then I do: Then I got this error message: I tested the CustomChinese class, and it seems working well. So what am I missing in packaging the model? |
Beta Was this translation helpful? Give feedback.
The warning is a serious warning. It means that the sourced components won't work because the vectors are missing.
You need to also copy the vectors from
zh_core_web_lgalong with the components. In a config (also withassemble) you'd specify this in[initialize]asvectors = "zh_core_web_lg".If you're doing this in a script then you should be able to just copy the vectors (with v3.2+):