How to hold 'validation_step' until training one epoch finished #7541

JongbinWoo · 2021-05-14T07:16:10Z

JongbinWoo
May 14, 2021

Hi.
It seems like validation_step runs simultaneously at the end of training.
As soon as validation_step start, the percentage of gpu memory allocated is shooting up and RuntimeError: CUDA out of memory occur.
How can i fix it?
Thanks in advance!

Answered by carmocca

May 17, 2021

validation_step runs simultaneously at the end of training.

Validation is considered part of the fitting procedure but it never runs concurrent to training.

As soon as validation_step start, the percentage of gpu memory allocated is shooting up and RuntimeError: CUDA out of memory occur. How can i fix it?

It's definitely caused by a bug either on your end or ours. Can you try and reproduce it?

You can adapt https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/bug_report_model.py to do it

View full answer

carmocca · 2021-05-17T20:58:30Z

carmocca
May 17, 2021

validation_step runs simultaneously at the end of training.

Validation is considered part of the fitting procedure but it never runs concurrent to training.

As soon as validation_step start, the percentage of gpu memory allocated is shooting up and RuntimeError: CUDA out of memory occur. How can i fix it?

It's definitely caused by a bug either on your end or ours. Can you try and reproduce it?

You can adapt https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/bug_report_model.py to do it

1 reply

JongbinWoo May 20, 2021
Author

Sorry. That was my mistake. I thought it runs concurrently because progress bar goes up simultaneously.
In validation step, I was doing Seq2Seq generation with num_beams=4. I think this is a cause of the error.
Thanks for answering!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to hold 'validation_step' until training one epoch finished #7541

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to hold 'validation_step' until training one epoch finished #7541

Uh oh!

Uh oh!

JongbinWoo May 14, 2021

Replies: 1 comment · 1 reply

Uh oh!

carmocca May 17, 2021

Uh oh!

JongbinWoo May 20, 2021 Author

JongbinWoo
May 14, 2021

Replies: 1 comment 1 reply

carmocca
May 17, 2021

JongbinWoo May 20, 2021
Author