Loading off-the-shelf morphologizer (fr_core_news_md) from config file in custom pipeline does not work #10298
-
How to reproduce the behaviourI'm trying to build a custom French nlp pipeline, based on the repro.py import spacy
from spacy.lang.fr import French
from thinc.config import Config
# Works
fr_core_news_md = spacy.load('fr_core_news_md')
valid_nlp = French()
valid_nlp.add_pipe('tok2vec', source=fr_core_news_md)
valid_nlp.add_pipe('morphologizer', source=fr_core_news_md)
doc = valid_nlp('Bonjour, comment-allez vous ?')
print([t.pos_ for t in doc]) # ['NOUN', 'PUNCT', 'ADV', 'AUX', 'NOUN', 'VERB', 'PUNCT']
# Does not work
config = Config().from_disk("foo.config")
nlp_invalid = French.from_config(config)
nlp_invalid.initialize()
doc_invalid = nlp_invalid('Bonjour, comment-allez vous ?')
print([t.pos_ for t in doc_invalid]) # ['PROPN', 'PROPN', 'PROPN', 'PROPN', 'PROPN', 'PROPN', 'PROPN']
foo.config Your Environment |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
|
Hi @AlexandreRozier , the reason why the code above doesn't work is because You can also do this programmatically by disabling all the sourced components before calling the with nlp.select_pipes(disable=[*sourced]):
nlp.initialize()Again, it may be easier to use the |
Beta Was this translation helpful? Give feedback.
Hi @AlexandreRozier , the reason why the code above doesn't work is because
nlp.initialize()clears all the weights. In relation to this, you can check out thespacy assemblecommand and pass your config file. Under the hood,assembledoes not initialize the components you've sourced.You can also do this programmatically by disabling all the sourced components before calling the
nlp.initialize()method:Again, it may be easier to use the
spacy assemblecommand, as it does this step for you. You can also check out its implementation to see how it creates the pipeline.