Freezing tok2vec vs unfreezing tok2vec while updating pre-trained model #12518

umayerr · 2023-04-10T22:59:02Z

umayerr
Apr 10, 2023

I took a blank model and trained that (modelA) for ner with the vector of en_core_web_lg (for accuracy). After training it has only two components: tok2vec and ner. Now I have two scenarios below:

When I tried to update modelA with new dataset, I freeze tok2vec and I got ValueError: [E954] The Tok2Vec listener did not receive any valid input from an upstream component. However, I did not get this error when I updated a pre-trained model (e.g. en_core_web_sm) with a new dataset freezing all components except ner (tok2vec was in the freeze component list). I understand that ner has dependency on tok2vec because it needs vectorized tokens for recognition. If so, then how did the updating process work for en_core_web_sm where tok2vec was also in freeze component list?
When I tried updating modelA (keeping tok2vec unfreeze) I got ValueError: Shape mismatch for blis.gemm: (0, 0), (1280, 768). For checking, I took another blank model and trained that (testModel) for ner without any vector (for efficiency) and I could successfully update the testModel with new dataset. Why are these models behaving differently while updating with new data? Is this because of the vector of en_core_web_lg? How can I resolve this error?
Here is the code for updating model with new data.

# loading model and getting pipe
nlp = spacy.load('modelA/model-best/')
ner = nlp.get_pipe('ner')

# checking labels from training data and adding to the pipe if neccessary
existing_label = list(ner.labels)
label_in_train_data = []

for text, annotation in TRAIN_DATA:
    for entity in annotation.get("entities"):
        if entity[2] not in label_in_train_data:
            label_in_train_data.append(entity[2])
            
for label in label_in_train_data:
    if label not in existing_label:
        ner.add_label(label)

# blocking other pipes to be updated and starting training
affected_pipe = ['tok2vec', 'ner']
unaffected_pipe = [pipe for pipe in nlp.pipe_names if pipe not in affected_pipe]

with nlp.disable_pipes(*unaffected_pipe):
    optimizer = nlp.create_optimizer()
    for iter_ in range(10):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotation in tqdm(TRAIN_DATA):
            example = []
            doc = nlp.make_doc(text)
            print(doc)
            print(doc.vector.shape)
            example.append(Example.from_dict(doc, annotation))
            nlp.update(example, drop=0.2, sgd=optimizer, losses=losses)
nlp.to_disk('newModel')

Answered by svlandeg

Apr 12, 2023

Hi!

When I tried to update modelA with new dataset, I freeze tok2vec and I got ValueError: [E954] The Tok2Vec listener did not receive any valid input from an upstream component.

If you have a tok2vec component in the pipeline that you want to freeze but you also want to continue training other components on top of it, then you should add this tok2vec component to the annotating_components section in your config:

[training]
annotating_components = ["tok2vec"]

This will basically ensure that the component runs during training but isn't backpropagated into - i.e. it won't update as long as it's (also) in the frozen components list.

However, I did not get this error when I updated a pre-…

View full answer

svlandeg · 2023-04-12T21:18:17Z

svlandeg
Apr 12, 2023

Hi!

When I tried to update modelA with new dataset, I freeze tok2vec and I got ValueError: [E954] The Tok2Vec listener did not receive any valid input from an upstream component.

If you have a tok2vec component in the pipeline that you want to freeze but you also want to continue training other components on top of it, then you should add this tok2vec component to the annotating_components section in your config:

[training]
annotating_components = ["tok2vec"]

This will basically ensure that the component runs during training but isn't backpropagated into - i.e. it won't update as long as it's (also) in the frozen components list.

However, I did not get this error when I updated a pre-trained model (e.g. en_core_web_sm)

That's a good question with a somewhat complex answer but that might help you better understand what's going on. You can see the configuration of the models in the config file of en_core_web_sm. It has this pipeline:

pipeline = ["tok2vec","tagger","parser","senter","attribute_ruler","lemmatizer","ner"]

But the tok2vec component here in fact is only used by the parser and the tagger, who use it through the "listener" mechanism:

[components.tagger.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"

In those cases (like in your original case), when you're using the listener mechanism for the tok2vec layer, you can freeze it + put it in annotating_components as I explained.

The ner component of that specific pipeline however, has its own sort of "private" tok2vec model which is part of the NER model only:

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2Vec.v2"

This means that if you freeze the whole pipeline except for the NER, you're actually still (re)training the internal tok2vec layer of that NER model - it's not frozen. You can only freeze full components, like the tok2vec component that is defined in [components.tok2vec].

More background information on those two different patterns can be found here: https://spacy.io/usage/embeddings-transformers; it's basically the difference between having a "shared" Tok2Vec layer or an independent one - the first is a separate component by itself, the second is an independent layer within the model of another component.

10 replies

umayerr Apr 25, 2023
Author

Sure! Here it is,
Training blank model:
Steps I followed:

Creating config.cfg file. Here is the config file. I used en_core_sci_md's vectors to train the blank model. The model en_core_sci_md is provided by Scispacy for recognizing scientific data.
Creating train.spacy and dev.spacy. I have reduced both files with 5 examples each.
Training model with python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy command.
Based on the configuration mentioned above, I have also added a small TRAINED_MODEL (trained with 5 examples) here. For the simplicity I used same data (train.spacy) for training and development of the model. I got the error when I tried to update this TRAINED_MODEL following the steps described below.

Updating the trained model with new data:
Steps I followed:

Converting JSON examples to spacy training data format. This is the code for converting JSON to spacy data.
Updating the model with new data. This is the code.

svlandeg Jun 5, 2023

Hi! Sorry for the delayed response - I lost track of this due to a release deadline and some holidays. I'll look into this in more detail later this week (if it's still an issue?)

svlandeg Jun 19, 2023

Hi @ureza!

Thanks for providing the files and detailed instructions, that's always super helpful. Unfortunately, I still haven't been able to reproduce this. Your config looks fine, so does the script, and on my end I can get it to run on new data while adding new labels to the ner on the fly, as expected.

Can you try a few things for me, each independently and see whether perhaps that fixes the issues on your end? It'll help us narrow things down.

Replace en_core_sci_md with en_core_web_md in your config file, then run the whole flow again (training + retraining)
Try running everything on spaCy 3.5.1 instead (if I'm not mistaken, Scispacy is currently limited to <3.5, so you'll want to use en_core_web_md)
Instead of the JSON conversion, try with some data defined in-code, e.g.

    TRAIN_DATA = [
        (
            "Russ Cochran captured his first major title with his son as caddie.",
            {"entities": [(0, 12, "PERSON")]},
        ),
        (
            "Russ Cochran his reprints include EC Comics.",
            {"entities": [(0, 12, "PERSON"), (34, 43, "ART")]},
        ),
    ]

umayerr Jun 26, 2023
Author

Hi @svlandeg!
Apologies for the delayed response. I'm aware of the recent news regarding the release of LLMs, and I hope it was a successful endeavor. On a positive note, I managed to identify the issue and successfully resolved it, allowing the entire flow to work seamlessly. The problem originated from the training data, specifically an overlooked empty annotation (like below). I appreciate your continued support and patience throughout this process. I sincerely apologize for overlooking such a trivial aspect in the training data. Thank you once again for helping me.

    [
      "",
      {
        "entities": []
      }
    ]

svlandeg Jun 26, 2023

Hi @ureza , no need to apologize at all! I'm happy to hear you got the issue resolved on your end 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Freezing tok2vec vs unfreezing tok2vec while updating pre-trained model #12518

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 10 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Freezing tok2vec vs unfreezing tok2vec while updating pre-trained model #12518

Uh oh!

umayerr Apr 10, 2023

Replies: 1 comment · 10 replies

Uh oh!

svlandeg Apr 12, 2023

Uh oh!

umayerr Apr 25, 2023 Author

Uh oh!

svlandeg Jun 5, 2023

Uh oh!

svlandeg Jun 19, 2023

Uh oh!

umayerr Jun 26, 2023 Author

Uh oh!

svlandeg Jun 26, 2023

umayerr
Apr 10, 2023

Replies: 1 comment 10 replies

svlandeg
Apr 12, 2023

umayerr Apr 25, 2023
Author

umayerr Jun 26, 2023
Author