How do you checkpoint retrospectively? #13068
Unanswered
shehzaidi
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
do you mean checkpoint every 100 steps? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I have a training script in which the model loss spikes suddenly around 100k steps into training. I would like to create checkpoints to reproduce the ~10-100 steps before this happens to debug it.
What's the easiest way to checkpoint "retrospectively"? I.e. when a certain condition is met, I would like to checkpoint the state in the last e.g. 100 steps of training (or just the checkpoint from 100 steps ago).
Thanks! 🙂
Beta Was this translation helpful? Give feedback.
All reactions