T5 FineTuning freezes on 0% Validation Sanity Check #8543

prikmm · 2021-07-25T13:03:14Z

prikmm
Jul 25, 2021

I am currently trying to finetune T5 model for Summarization task. I refactored the code present in this T5 Fine Tuning Notebook according to latest API, and my data. But, I am experiencing freezing of model training on the begin of Validation Sanity Check.
I get this warning -

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/decorators.py:99: UserWarning: The model layers do not match after moving to the target device. If your model employs weight sharing on TPU, please tie your weights using the `on_post_move_to_device` model hook.
Layer count: [Before: 131 After: 132]
  f'The model layers do not match after moving to the target device.'

I have created a basic reproducible notebook on XSumm dataset.

The training starts perfectly when using a GPU on both kaggle and colab. The freezing occurs when using TPU only on both Kaggle and colab.

tchaton · 2021-07-26T09:24:51Z

tchaton
Jul 26, 2021
Maintainer

Dear @prikmm,

TPU have a weird behaviour with sharing parameters. Basically, it works only after the weights are moved on TPU. If you do it before, it won't work.

This warning is telling you that it detected that some parameters are being tied on cpu, but after moving them to TPU, this is lost ...

You need to add the logic too on_post_move_to_device.

Best,
T.C

1 reply

tchaton Jul 26, 2021
Maintainer

Hey @prikmm,

Opened a feature request for Lightning to support this in the back: #8555.

Feel free to contribute this feature if you want :)

Best,
T.C

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

T5 FineTuning freezes on 0% Validation Sanity Check #8543

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

T5 FineTuning freezes on 0% Validation Sanity Check #8543

Uh oh!

prikmm Jul 25, 2021

Replies: 1 comment · 1 reply

Uh oh!

tchaton Jul 26, 2021 Maintainer

Uh oh!

tchaton Jul 26, 2021 Maintainer

prikmm
Jul 25, 2021

Replies: 1 comment 1 reply

tchaton
Jul 26, 2021
Maintainer

tchaton Jul 26, 2021
Maintainer