The trainer's global step and current epoch don't change #15970
Unanswered
Dee-Ma
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment 3 replies
-
Hi, PyTorch Lightning is not intended to be used like this currently. We are updating the counters for steps and epochs within our loop API; therefore, if you don't call fit, you bypass these updates. Why do you code the loops yourself? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am using PyTorch Lightning to train a model. The code looks like below. In the code, instead of calling
trainer.fit()
to run the training automatically, we manually run ourtrain_step
and separately run callbacks' inner functions (for example,on_train_batch_end()
,on_train_epoch_end()
, etc.) to try to make the callbacks work manually (this likes a 'debug' mode).After running the code like this, it seems that the
trainer.global_step
always equals to 0 and thetrainer.current_epoch
doesn't change neither during the training process. In this case, the callbacks (for example, ModelCheckpoint does not work properly).Could I know if
trainer.global_step
will never change if we don't runtrainer.fit()
? Do we have a way to 'manually' set the value oftrainer.global_step
? I am wondering if the callbacks can only be used withtrainer.fit()
- so the code should never be written like this if we use PyTorch Lightning?Many thanks for help!
Beta Was this translation helpful? Give feedback.
All reactions