Logging a running max val_acc_epoch in TensorBoard hparams tab #15759

daniel-a-diaz · 2022-11-21T22:11:51Z

daniel-a-diaz
Nov 21, 2022

Currently I am using TensorBoardLogger with default_hp_metric=False, and logging hyperparameters and metrics to the hparams tab before I call trainer.fit() like this

hyperparameters = dict(optimizer = optimizer_name, batch_size=batch_size, lr=lr,
                       sch_factor=sch_factor, sch_patience = sch_patience)
metrics = dict(val_accuracy = 0)

trainer.logger.log_hyperparams(hyperparameters, metrics)

And then in LightningModule, with torchmetrics.Accuracy I keep a running log of val_accuracy like this

self.val_acc = torchmetrics.Accuracy()

def validation_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self.model(x)
    val_loss = self.criterion(y_hat, y)
    self.val_acc(y_hat, y)
    self.log_dict({"val_loss": val_loss, "val_acc": self.val_acc},
                  on_epoch=True, on_step=False, sync_dist=True)
    self.log("val_accuracy", self.val_acc,
             on_epoch=True, on_step=False, sync_dist=True)

What I want to do is keep a running log of the max epoch val_accuracy. It seemed like I could do this with reduce_fx='max', except it doesn't apply when using torchmetrics.Metric. I have tried experimenting with torchmetrics.MinMaxMetric, but I couldn't get it to work. Is there anything like ModelCheckpoint.best_model_score you can call within the validation_step? I have been simply doing log.hyperparams a second time with the best_model_score after running trainer.fit(), which works fine using DDPStrategy and Lightning's EarlyStopping. But when I was running Optuna studies in parallel with their pruner using the PyTorchLightningPruningCallback integration, I was having issues with updating the val_accuracy to the best score correctly after training. Or actually even running any code at all after trainer.fit(). It looks like pruned trials return from trainer.fit() so that any code afterwards doesn't run.

So my question is what would be the most elegant lightning solution for getting this to work. It would be nice to easily sort the hparams tab based on the val_accuracy of the save_top_k=1 checkpoint.

Also I am using pytorch-lightning=1.5.10 in order to run Optuna in parallel and use their PyTorchLightningPruningCallback integration. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logging a running max val_acc_epoch in TensorBoard hparams tab #15759

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Logging a running max val_acc_epoch in TensorBoard hparams tab #15759

Uh oh!

Uh oh!

daniel-a-diaz Nov 21, 2022

Replies: 0 comments

daniel-a-diaz
Nov 21, 2022