train_time_interval checkpoint and metric value #13900
Unanswered
DA-L3
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am currently trying to save a checkpoint after some period of time using
train_time_interval
PT Lightning version: 1.6.4
I will call this checkpoint
time
.Further I have another checkpoint callback monitoring my metric named
val_ca
, I call itmonitor
.But I want to resume from my
time
checkpoint.In the
validation_step
loop, I useself.log('val_ca', value, on_step=False, on_epoch=True, logger=True)
. I only validate this value at the end of an epoch.But since the
time
checkpoint might save in between epochs, I would still like to carry that value into the checkpoint because resuming fromtime
yields:ModelCheckpoint(monitor='val_ca')
could not find the monitored key in the returned metrics: ['epoch', 'step']. HINT: Did you calllog('val_ca', value)
in theLightningModule
?when the
monitor
checkpoint is considered.Even saving at a time after at least one epoch finished (hence the
self.log
for theval_ca
was executed at least one time), did not resolve the issue.Does anyone have any ideas how I can carry this
val_ca
?Thanks!
Beta Was this translation helpful? Give feedback.
All reactions