Setting listener from sourced transformer #11411
-
Hello, I'm trying to train a text categorization model. I would like to train it using the transformer from en_core_web_trf since this pipeline also contains other components that I find useful for my current project. I'm currently using this config, but I'm not sure this is the right way to implement it. (referenced from here: Originally posted by @polm in #11187 (comment)) Is this the right way to implement this? My intuition is telling me that I should somehow reference the "sourced" component instead of the the generic Any help is appreciated! The components section in config.cfg looks like this:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
If you want to use a custom component in addition to the pretrained pipelines, I would recommend you train your component in isolation, and then source the components you want from the pretrained pipeline afterwards, whether in code or using The one downside of this approach is you'll need two copies of the Transformer (/tok2vec), which will take up more disk and memory. But the alternative is training your model with a frozen Transformer, which will limit the accuracy you can achieve. (Also freezing Transformers isn't straightforward at the moment - you can't use |
Beta Was this translation helpful? Give feedback.
If you want to use a custom component in addition to the pretrained pipelines, I would recommend you train your component in isolation, and then source the components you want from the pretrained pipeline afterwards, whether in code or using
spacy assemble
.The one downside of this approach is you'll need two copies of the Transformer (/tok2vec), which will take up more disk and memory. But the alternative is training your model with a frozen Transformer, which will limit the accuracy you can achieve. (Also freezing Transformers isn't straightforward at the moment - you can't use
frozen_components
, you have to setgrad_factor = 0
.)