A Hydranet Arch to Solve "Catastrophic Forgetting" Problem for NER #9968
-
I'm aware of the catastrophic forgetting problem while training NER, and also went though the following solution: https://explosion.ai/blog/pseudo-rehearsal-catastrophic-forgetting. But I'm not fully convinced of pseudo-rehearsal being a robust solution, in an environment where new labels (can be distinct) are constantly being added, it would throw off the weights with respect to older labels and pose stability and reliability issues. Also it is not be possible to discard the labels. Would it be possible to inject layers inspired by the hydranet arch? Essentially, a specialized NN for each label, so that the labels are disjoint and training can be done in an mutually exclusive fashion. Can custom TensorFlow heads be added on top of "en_core_web_lg" for instance (utilising the pre-trained weights)? Similar to what is described here (chaining custom models using thinc wrapper): https://spacy.io/usage/layers-architectures#frameworks The above documentation does not provide any info on adding custom TensorFlow heads on top of en_core_web_lg or any pre-pretrained model. I also went though the sourcecode, but had no luck on extending from the layers used by NER models. This would solve the "Catastrophic Forgetting" problem altogether, and enhance the stability and reliability. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Found the solution: #9940 |
Beta Was this translation helpful? Give feedback.
-
Just to address this point about bringing in task-specific heads from other frameworks:
If you wrap the heads in Thinc somehow this should be possible, but I don't think anyone has done this yet. Even for Transformers/Torch task-specific heads are one of the things we don't support yet. Our typical advice is to use the external model as a feature source (tok2vec) and train native spaCy components on top of it. |
Beta Was this translation helpful? Give feedback.
Found the solution: #9940