How to extend the training with an already finished check points. #8418
Unanswered
saedrna
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
Dear @saedrna, You could have a python script where you reload the weights and restart a training from scratch. Trainer(resume_from_checkpoint=..) is really meant to restart the training as expected.
There is other options, but they involved more hacking around such dumping |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I want to extend training of an already finished check points.
Say, I set the num epochs to 16 of a job and finished all the 16 rounds.
Then I have found it is not sufficiently trained, because the validation accuracy is still increasing.
So I start another job set the num epochs to 24 and set
resume_from_checkpoint
to the last check points.However, an annoying issue is that my schedular is also restored, which always set the learning rate to
1e-8
(eta_min).Is there some trick to work this around?
Thanks,
Han
Beta Was this translation helpful? Give feedback.
All reactions