spancat shared architecture listeners not working as expected #11859

frimonkwork · 2022-11-23T02:39:53Z

frimonkwork
Nov 23, 2022

Hello, for some reason I see the memory/disk footprint of my nlp pipeline without listeners is less compared to with listeners(assuming am doing it correctly :))

Without listeners spacy nlp pipeline

nlp_spancat = spacy.load('./Source/Python/span_cat_myown/training/model-last') snlp = spacy.load("en_core_web_tf") snlp.add_pipe('spancat', source=nlp_spancat, name="span_cat_myown") snlp.to_disk("./snlp")
the model file of span_cat_myown on disk is just 4kb and overall ~0.5 GB

With listeners spacy nlp pipeline
want span_cat_myown component to listen to en_core_web_tf transformer as span_cat_myown trained using en_core_web_tf transformer.

the model file of span_cat_myown on disk is 483MB(I think it is writing duplicate transfomers layer) and overall ~1GB

nlp_spancat = spacy.load('./Source/Python/span_cat_myown/training/model-last') nlp_spancat.replace_listeners("transformer", "spancat", ["model.tok2vec"]) snlp = spacy.load("en_core_web_tf") snlp.add_pipe('spancat', source=nlp_spancat, name="span_cat_myown") snlp.to_disk("./snlp")

please any suggestions on why the span_cat_myown model file on disk is high with listeners and low without listeners?

also not sure if am implementing listeners correctly.

adrianeboyd · 2022-11-23T07:36:54Z

adrianeboyd
Nov 23, 2022

The first option won't work because your spancat component needs its own transformer as input, not the one from en_core_web_trf. If you run this pipeline, you'll get nonsense output from the spancat component.

The second option with replace_listeners is correct because it preserves the spancat's fine-tuned transformer component (by moving it into an internal spot in the model rather than using listeners). The final pipeline is twice as large because there are two large transformer components internally.

0 replies

frimonkwork · 2022-11-23T14:39:21Z

frimonkwork
Nov 23, 2022
Author

Thank you @adrianeboyd am curious if there is a way to let spancat use transformer from en_core_web_trf to reduce the overall pipeline size. also is there a specific reason why spancat needs its own transformer. Below is my spancat model config used for training.

The second option with replace_listeners is correct because it preserves the spancat's fine-tuned transformer component (by moving it into an internal spot in the model rather than using listeners)

let me know if my understanding is correct. SpanCat fine-tuned transformer(embeddings/context vectors) part of spancat training which means Spancat not only learns to classify spans but also fine-tunes base "roberta-base" transformer layer for the SpanCat task. In other words, it may not be possible to use the transformer from en_core_web_trf as SpanCat needs fine-tuned transformer compared to the original transformer from "roberta-base". If this is the case, is it wise to use a fine-tuned transformer from SpanCat for other components(lets say NER/sentence classification for example)? In general what are the guidelines for using a shared transformer in cases where the transformer is fine-tuned for a specific task(SpanCat) and is there documentation on which tasks fine-tune the transformer vs which don't?

`
[nlp]
lang = "en"
pipeline = ["sentencizer","transformer","spancat","doc_cleaner"]
batch_size = 128
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@Tokenizers":"spacy.Tokenizer.v1"}

[components.spancat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "transformer"

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
mixed_precision = false

`

1 reply

adrianeboyd Nov 24, 2022

The basic constraint for a shared transformer is that you need to train all the tasks at the same time on the same training instances, so you'd need training data with tags+parses+NER+spans if you want to train a model like en_core_web_trf with all the existing components plus spancat.

If you take the shared transformer component and fine-tune it for only one of the components, the performance for the other components drops off a cliff.

It's typically rare to have training data with all this annotation, especially the dependency annotation. One thing you can try is adding silver annotation from en_core_web_trf to your spancat training data and then training on all components. This is similar to the "pseudo-rehearsal" solution described here related to catastrophic forgetting: https://explosion.ai/blog/pseudo-rehearsal-catastrophic-forgetting#pseudorehearsal

The performance for the en_core_web_trf components may go down some, and it can depend a lot on whether your spancat texts gets high quality annotation from en_core_web_trf, which can depend on whether the genre is similar to the original training data (OntoNotes, mostly news/web/broadcast texts). It can be tricky to get good overall performance with this approach, so it's often easier to train spancat separately and know that you'll have good performance for all components.

If the size or speed with two transformers is a problem, it might be easier to use en_core_web_md or en_core_web_lg as the base model, since they have static vectors that can be shared by all components, and then your spancat can have a small, separate tok2vec without requiring much extra space. This is the same as the design for ner in en_core_web_lg. It has an internal tok2vec that's separate from all the other components. The design is described here: https://spacy.io/models#design

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

spancat shared architecture listeners not working as expected #11859

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

spancat shared architecture listeners not working as expected #11859

Uh oh!

Uh oh!

frimonkwork Nov 23, 2022

Replies: 2 comments · 1 reply

Uh oh!

adrianeboyd Nov 23, 2022

Uh oh!

Uh oh!

frimonkwork Nov 23, 2022 Author

Uh oh!

adrianeboyd Nov 24, 2022

frimonkwork
Nov 23, 2022

Replies: 2 comments 1 reply

adrianeboyd
Nov 23, 2022

frimonkwork
Nov 23, 2022
Author