Multiple textcat_multilabel components in same spacy v3 pipeline #7498
Replies: 1 comment 2 replies
-
It is possible to do this with two configs where the second one sources all the components from the first config and freezes them while training with the second dataset, but it can be easier to train two separate models without frozen components and then have a collate script that combines the two models. This is what we do for all the pretrained models like import spacy
nlp1 = spacy.load("model1") # ["tok2vec","ner","textcat_multilabel"]
nlp2 = spacy.load("model2") # ["textcat_multilabel"]
nlp1.add_pipe("textcat_multilabel", name="textcat_multilabel2", source=nlp2)
nlp1.to_disk("combined_model") If your second nlp2.get_pipe("textcat_multilabel").replace_listeners("tok2vec", "textcat_multilabel", ["model.tok2vec"])
nlp1.add_pipe("textcat_multilabel", "textcat_multilabel2", source=nlp2) (Just like you can have |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey I am trying to migrate an exisiting spacy 2.0 NLP pipeline to 3.0. Our current pipeline consists of a trained NER and two multi-label, TextCategorizer components. We originally had three custom training scripts. The script for training the NER component and TextCat-1 component utilized the same dataset. The TextCat-2 component was trained on a different dataset. The training scripts would disable all but the current pipeline component for training.
I was able to convert both existing datasets to the new DocBin format. I have been successful training the NER and TextCat-1 using the
python -m spacy train --paths.dev <> --paths.train <>
My current pipeline config is
pipeline = ["tok2vec","ner","textcat_multilabel"]
I am not total sure what my pipeline structure should be for this.Should I create custom components that use the "textcat_multilabel" factory and then train both separately freezing_components and specifying data paths through the CLI options. If so is there a best practice in terms of what my training config should be.
Beta Was this translation helpful? Give feedback.
All reactions