Skip to content
Discussion options

You must be logged in to vote

@bibinwils — this was likely caused by version incompatibility in Colab circa 2021. A lot has improved since. Here's a clean setup that works today:

Install:

!pip install -q lightning torchmetrics

Common causes of GPU memory / slowness with metrics:

  1. Calling metric(preds, target) every step.forward() triggers compute() each time. For metrics that store all predictions (AUROC, PRC), this is expensive. Use .update() + epoch-level .compute():
def training_step(self, batch, batch_idx):
    self.train_metrics.update(y_hat, y)  # cheap
    return loss

def on_train_epoch_end(self):
    self.log_dict(self.train_metrics.compute())  # compute once
    self.train_metrics.reset()
  1. Not calling .…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@bibinwils
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by Borda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants