Size mismatch error during train #942

marthafikry · 2023-02-08T16:47:52Z

marthafikry
Feb 8, 2023

We utilize Whispar to train an arabic dataset.

All of our steps are in this notebook https://colab.research.google.com/drive/1msFlRKDXnZsaAZhvFlYVyxPsUA6qGQFo?usp=sharing

But when we try to train, we get the following error:

/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:346: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
* Running training *
Num examples = 1260
Num Epochs = 13
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 1000
Number of trainable parameters = 241734912

RuntimeError Traceback (most recent call last)
in
----> 1 trainer.train()

9 frames
/usr/local/lib/python3.8/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1574 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
1575 )
-> 1576 return inner_training_loop(
1577 args=args,
1578 resume_from_checkpoint=resume_from_checkpoint,

/usr/local/lib/python3.8/dist-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1841 tr_loss_step = self.training_step(model, inputs)
1842 else:
-> 1843 tr_loss_step = self.training_step(model, inputs)
1844
1845 if (

/usr/local/lib/python3.8/dist-packages/transformers/trainer.py in training_step(self, model, inputs)
2586
2587 with self.compute_loss_context_manager():
-> 2588 loss = self.compute_loss(model, inputs)
2589
2590 if self.args.n_gpu > 1:

/usr/local/lib/python3.8/dist-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
2618 else:
2619 labels = None
-> 2620 outputs = model(**inputs)
2621 # Save past state if it exists
2622 # TODO: this needs to be fixed and made cleaner later.

/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.8/dist-packages/transformers/models/whisper/modeling_whisper.py in forward(self, input_features, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
1192 )
1193
-> 1194 outputs = self.model(
1195 input_features,
1196 decoder_input_ids=decoder_input_ids,

/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.8/dist-packages/transformers/models/whisper/modeling_whisper.py in forward(self, input_features, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, decoder_inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
1061
1062 # decoder outputs consists of (dec_features, past_key_value, dec_hidden, dec_attn)
-> 1063 decoder_outputs = self.decoder(
1064 input_ids=decoder_input_ids,
1065 attention_mask=decoder_attention_mask,

/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.8/dist-packages/transformers/models/whisper/modeling_whisper.py in forward(self, input_ids, attention_mask, encoder_hidden_states, head_mask, cross_attn_head_mask, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
870 positions = self.embed_positions(input_ids, past_key_values_length=past_key_values_length)
871
--> 872 hidden_states = inputs_embeds + positions
873 hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training)
874

RuntimeError: The size of tensor a (476) must match the size of tensor b (448) at non-singleton dimension 1

Can anyone help us?

Answered by sanchit-gandhi

Feb 9, 2023

Hey @marthafikry! Cool to see that you're fine-tuning Whisper for Arabic!

The issue is with your target label sequences. Some of the label sequences have a length that exceeds the model’s maximum generation length. These must be very long sequences, as the maximum generation length is 448. This is the longest sequence the model is configured to handle (model.config.max_length).

We've got two options here:

Filter any label sequences longer than max length
Increase the models' max length

What we can do is compute the labels length of each target sequence:

def prepare_dataset(batch):
    # load and resample audio data from 48 to 16kHz
    audio = batch["audio"]

    # compute input length
…

View full answer

jongwook · 2023-02-09T01:37:15Z

jongwook
Feb 9, 2023
Maintainer

Hi, it appears that the text data is somehow not padded correctly to match the text encoder's context size (448). I'm not too familiar with the preprocessing pipelines in huggingface transformers, and it might be better answered by @sanchit-gandhi who wrote the notebook. (thanks!)

0 replies

sanchit-gandhi · 2023-02-09T11:07:21Z

sanchit-gandhi
Feb 9, 2023

Hey @marthafikry! Cool to see that you're fine-tuning Whisper for Arabic!

The issue is with your target label sequences. Some of the label sequences have a length that exceeds the model’s maximum generation length. These must be very long sequences, as the maximum generation length is 448. This is the longest sequence the model is configured to handle (model.config.max_length).

We've got two options here:

Filter any label sequences longer than max length
Increase the models' max length

What we can do is compute the labels length of each target sequence:

def prepare_dataset(batch):
    # load and resample audio data from 48 to 16kHz
    audio = batch["audio"]

    # compute input length
    batch["input_length"] = len(batch["audio"])

    # compute log-Mel input features from input audio array 
    batch["input_features"] = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]

    # encode target text to label ids 
    batch["labels"] = tokenizer(batch["sentence"]).input_ids

    # compute labels length
    batch["labels_length"] = len(batch["labels"])
    return batch

And then filter those that exceed the models maximum length:

MAX_DURATION_IN_SECONDS = 30.0
max_input_length = MAX_DURATION_IN_SECONDS * 16000

def filter_inputs(input_length):
    """Filter inputs with zero input length or longer than 30s"""
    return 0 < input_length < max_input_length

max_label_length = model.config.max_length

def filter_labels(labels_length):
    """Filter label sequences longer than max length (448)"""
    return labels_length < max_label_length

You can then apply the prepare_dataset function and the two new filter functions to your dataset common_voice as follows:

# pre-process
common_voice = common_voice.map(prepare_dataset, remove_columns= my_dataset.column_names["train"])
# filter by audio length
common_voice = common_voice.filter(filter_inputs, input_columns=["input_length"], remove_columns=["input_length"]
# filter by label length
common_voice = common_voice.filter(filter_labels, input_columns=["labels_length"], remove_columns=["labels_length"])

That should pre-process the dataset and remove any label sequences that are too long for the model.

Alternatively, we can change the model’s max length to any value we want:

model.config.max_length = 500

This will update the max length to 500 tokens. Make sure to do this before you filter for it to take effect:

max_label_length  = model.config.max_length = 500

def filter_labels(labels_length):
    """Filter label sequences longer than the new max length (500)"""
    return labels_length < max_label_length

5 replies

marthafikry Feb 22, 2023
Author

Thanks @sanchit-gandhi it works with me ...thanks for your help :)

huynhthanh98 Feb 23, 2023

Hi @sanchit-gandhi !

I set model.config.max_length=580 but the model still has max_length is 448

/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:346: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
`use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`...
`use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`...
`use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`...
`use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`...

RuntimeError Traceback (most recent call last)
in
----> 1 trainer.train()

9 frames
/usr/local/lib/python3.8/dist-packages/transformers/models/whisper/modeling_whisper.py in forward(self, input_ids, attention_mask, encoder_hidden_states, head_mask, cross_attn_head_mask, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
870 positions = self.embed_positions(input_ids, past_key_values_length=past_key_values_length)
871
--> 872 hidden_states = inputs_embeds + positions
873 hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training)
874

RuntimeError: The size of tensor a (506) must match the size of tensor b (448) at non-singleton dimension 1

huynhthanh98 Feb 28, 2023

Thanks @sanchit-gandhi it works with me ...thanks for your help :)

Hi @marthafikry,

did the model work well when you set model.config.max_length = 500?

Thanks.

sanchit-gandhi Mar 3, 2023

Hey @huynhthanh98, could you make sure that you set the max length before you instantiate the trainer? Best to do it right after you load the model from pretrained.

ATISHAYS99 Apr 25, 2024

@sanchit-gandhi i was having same error, i updated the max_length according to my labels, but still got same mismatch error in training model, do i need to update the max_target_positions also as it is also equals 448?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Size mismatch error during train #942

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Size mismatch error during train #942

Uh oh!

marthafikry Feb 8, 2023

Replies: 2 comments · 5 replies

Uh oh!

jongwook Feb 9, 2023 Maintainer

Uh oh!

sanchit-gandhi Feb 9, 2023

Uh oh!

marthafikry Feb 22, 2023 Author

Uh oh!

huynhthanh98 Feb 23, 2023

Uh oh!

huynhthanh98 Feb 28, 2023

Uh oh!

sanchit-gandhi Mar 3, 2023

Uh oh!

ATISHAYS99 Apr 25, 2024

marthafikry
Feb 8, 2023

Replies: 2 comments 5 replies

jongwook
Feb 9, 2023
Maintainer

sanchit-gandhi
Feb 9, 2023

marthafikry Feb 22, 2023
Author