RuntimeError: The size of tensor a must match the size of tensor b #7530

nleguillarme · 2021-03-22T11:13:18Z

nleguillarme
Mar 22, 2021

Hi.

I trained a ["transformer", "ner"] pipeline by fine-tuning biobert-base-cased-v1.1 on my domain-specific corpus.

At prediction time, and only on one of my 4 evaluation corpora, I get the following error :

python3.8/site-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    204         if self.position_embedding_type == "absolute":
    205             position_embeddings = self.position_embeddings(position_ids)
--> 206             embeddings += position_embeddings
    207         embeddings = self.LayerNorm(embeddings)
    208         embeddings = self.dropout(embeddings)

RuntimeError: The size of tensor a (560) must match the size of tensor b (512) at non-singleton dimension 1

Any idea of what is going wrong ?

Answered by adrianeboyd

Mar 22, 2021

My best guess is that one of the evaluation corpora has longer texts than the others, but the details would depend on your span_getters setting.

You probably need to add a model_max_length to your transformer model as described here: #7393 (comment)

You can probably achieve this by adding tokenizer_config.json with the right settings to the saved transformers/model directory without retraining. I haven't tested this, but it will probably look something like this:

{"model_max_length": 512}

View full answer

adrianeboyd · 2021-03-22T11:30:13Z

adrianeboyd
Mar 22, 2021

My best guess is that one of the evaluation corpora has longer texts than the others, but the details would depend on your span_getters setting.

You probably need to add a model_max_length to your transformer model as described here: #7393 (comment)

You can probably achieve this by adding tokenizer_config.json with the right settings to the saved transformers/model directory without retraining. I haven't tested this, but it will probably look something like this:

{"model_max_length": 512}

3 replies

nleguillarme Mar 22, 2021
Author

Thank you @adrianeboyd

Here are my span getters settings :

@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

I retrained my model using biobert-v1.1 which, unlike biobert-base-cased-v1.1, has a tokenizer_config.json file with the "model_max_length" field set to 512.

Now, when testing the model, I have the following warning message, but it doesn't crash
Token indices sequence length is longer than the specified maximum sequence length for this model (603 > 512). Running this sequence through the model will result in indexing errors

However, I thought that using strided spans would save me from the problem of maximum sequence length in transformer models. I don't seem to have understood everything about it...

adrianeboyd Mar 22, 2021

That is a warning from the transformers tokenizer that you can ignore because spacy-transformers truncates it before passing it on to the model. If it did get passed to the model, the model would crash, so if it keeps training, everything is working as expected.

Even with strided spans, you can occasionally have a long token like a URL that pushes the wordpiece count beyond the limit. Our current solution is to truncate that span. (Other options are possible, but that was the easiest to implement.)

nleguillarme Mar 22, 2021
Author

Thank you for this clarification. Very helpful !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RuntimeError: The size of tensor a must match the size of tensor b #7530

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

RuntimeError: The size of tensor a must match the size of tensor b #7530

Uh oh!

nleguillarme Mar 22, 2021

Replies: 1 comment · 3 replies

Uh oh!

adrianeboyd Mar 22, 2021

Uh oh!

Uh oh!

nleguillarme Mar 22, 2021 Author

Uh oh!

adrianeboyd Mar 22, 2021

Uh oh!

nleguillarme Mar 22, 2021 Author

nleguillarme
Mar 22, 2021

Replies: 1 comment 3 replies

adrianeboyd
Mar 22, 2021

nleguillarme Mar 22, 2021
Author

nleguillarme Mar 22, 2021
Author