A Hydranet Arch to Solve "Catastrophic Forgetting" Problem for NER #9968

arshad171 · 2022-01-03T09:19:49Z

arshad171
Jan 3, 2022

I'm aware of the catastrophic forgetting problem while training NER, and also went though the following solution: https://explosion.ai/blog/pseudo-rehearsal-catastrophic-forgetting. But I'm not fully convinced of pseudo-rehearsal being a robust solution, in an environment where new labels (can be distinct) are constantly being added, it would throw off the weights with respect to older labels and pose stability and reliability issues. Also it is not be possible to discard the labels.

Would it be possible to inject layers inspired by the hydranet arch? Essentially, a specialized NN for each label, so that the labels are disjoint and training can be done in an mutually exclusive fashion.

Can custom TensorFlow heads be added on top of "en_core_web_lg" for instance (utilising the pre-trained weights)? Similar to what is described here (chaining custom models using thinc wrapper): https://spacy.io/usage/layers-architectures#frameworks

The above documentation does not provide any info on adding custom TensorFlow heads on top of en_core_web_lg or any pre-pretrained model.

I also went though the sourcecode, but had no luck on extending from the layers used by NER models.

This would solve the "Catastrophic Forgetting" problem altogether, and enhance the stability and reliability.

Answered by arshad171

Jan 3, 2022

Found the solution: #9940

View full answer

arshad171 · 2022-01-03T16:55:08Z

arshad171
Jan 3, 2022
Author

Found the solution: #9940

7 replies

arshad171 Jan 4, 2022
Author

and what about the tok2vec component, isn't it pre-traineind (as part of the pre-trained spacy models, en_core_web_lg for instance), I usually disable/exclude the tok2vec component before fine-tuning the NER model, so that I can use the same tok2vec component that comes with the pre-trained spacy model.

polm Jan 4, 2022

Catastrophic forgetting happens when you retrain an NER component. You can't freeze certain labels in an NER component, the architecture just doesn't work that way. You could freeze some NER heads but not others, but that could cause other problems (will get to that shortly).

and what about the tok2vec component, isn't it pre-traineind (as part of the pre-trained spacy models, en_core_web_lg for instance), I usually disable/exclude the tok2vec component before fine-tuning the NER model, so that I can use the same tok2vec component that comes with the pre-trained spacy model.

You generally don't want to freeze the component like that. The tok2vec learns about the distribution of tokens in your corpus and builds suitable representations based on the heads you have. Freezing (basically feature extraction) can be useful if you have very little data, but in our experience you usually want to train the tok2vec. For non-Transformers models this should be pretty fast anyway.

Another thing is that heads and tok2vec trained together are mutually dependent. If you add a new head and train the tok2vec with that but freeze the old head, the tok2vec representations will change from what the old head expects, and you will typically get degraded performance, with nonsense output in the worst case.

polm Jan 4, 2022

As another note, I'm not sure how training multiple NER models at the same time would work. A pipeline shares training data, and there's only one place on each training Doc to put NER annotations. It would work using a spancat though.

arshad171 Jan 4, 2022
Author

You generally don't want to freeze the component like that. The tok2vec learns about the distribution of tokens in your corpus and builds suitable representations based on the heads you have. Freezing (basically feature extraction) can be useful if you have very little data, but in our experience you usually want to train the tok2vec. For non-Transformers models this should be pretty fast anyway.

but my main concern is the resulting size of the overall model, if each NER model has its own copy of tok2vec I assume that the merged model would require a lot more RAM.

polm Jan 4, 2022

It's true that if each NER gets its own tok2vec the model will get large. On the other hand, if you train the tok2vec with one head, other heads can become unusable. If you don't train the tok2vec at all that should work, but I suspect you'll take a significant accuracy hit - you'd have to train models and compare to be sure.

polm · 2022-01-04T04:36:00Z

polm
Jan 4, 2022

Just to address this point about bringing in task-specific heads from other frameworks:

Can custom TensorFlow heads be added on top of "en_core_web_lg" for instance (utilising the pre-trained weights)? Similar to what is described here (chaining custom models using thinc wrapper): https://spacy.io/usage/layers-architectures#frameworks

If you wrap the heads in Thinc somehow this should be possible, but I don't think anyone has done this yet. Even for Transformers/Torch task-specific heads are one of the things we don't support yet. Our typical advice is to use the external model as a feature source (tok2vec) and train native spaCy components on top of it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

A Hydranet Arch to Solve "Catastrophic Forgetting" Problem for NER #9968

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

A Hydranet Arch to Solve "Catastrophic Forgetting" Problem for NER #9968

Uh oh!

Uh oh!

arshad171 Jan 3, 2022

Replies: 2 comments · 7 replies

Uh oh!

arshad171 Jan 3, 2022 Author

Uh oh!

arshad171 Jan 4, 2022 Author

Uh oh!

polm Jan 4, 2022

Uh oh!

polm Jan 4, 2022

Uh oh!

arshad171 Jan 4, 2022 Author

Uh oh!

polm Jan 4, 2022

Uh oh!

polm Jan 4, 2022

arshad171
Jan 3, 2022

Replies: 2 comments 7 replies

arshad171
Jan 3, 2022
Author

arshad171 Jan 4, 2022
Author

arshad171 Jan 4, 2022
Author

polm
Jan 4, 2022