Loading off-the-shelf morphologizer (fr_core_news_md) from config file in custom pipeline does not work #10298
-
How to reproduce the behaviourI'm trying to build a custom French nlp pipeline, based on the repro.py import spacy
from spacy.lang.fr import French
from thinc.config import Config
# Works
fr_core_news_md = spacy.load('fr_core_news_md')
valid_nlp = French()
valid_nlp.add_pipe('tok2vec', source=fr_core_news_md)
valid_nlp.add_pipe('morphologizer', source=fr_core_news_md)
doc = valid_nlp('Bonjour, comment-allez vous ?')
print([t.pos_ for t in doc]) # ['NOUN', 'PUNCT', 'ADV', 'AUX', 'NOUN', 'VERB', 'PUNCT']
# Does not work
config = Config().from_disk("foo.config")
nlp_invalid = French.from_config(config)
nlp_invalid.initialize()
doc_invalid = nlp_invalid('Bonjour, comment-allez vous ?')
print([t.pos_ for t in doc_invalid]) # ['PROPN', 'PROPN', 'PROPN', 'PROPN', 'PROPN', 'PROPN', 'PROPN']
foo.config
Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi @AlexandreRozier , the reason why the code above doesn't work is because You can also do this programmatically by disabling all the sourced components before calling the with nlp.select_pipes(disable=[*sourced]):
nlp.initialize() Again, it may be easier to use the |
Beta Was this translation helpful? Give feedback.
Hi @AlexandreRozier , the reason why the code above doesn't work is because
nlp.initialize()
clears all the weights. In relation to this, you can check out thespacy assemble
command and pass your config file. Under the hood,assemble
does not initialize the components you've sourced.You can also do this programmatically by disabling all the sourced components before calling the
nlp.initialize()
method:Again, it may be easier to use the
spacy assemble
command, as it does this step for you. You can also check out its implementation to see how it creates the pipeline.