TorchMetrics Not working in google colab #562
-
|
I have taken the torchmetrics for regression task in google colab. and it's taking so much time and GPU memory to run. So I removed them from the train and validation step and put them only in the test_step. But after the training, in test state also it's not completing the run, and it's keeping on rolling. What should I do? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
Hi, could you please provide a link to your colab notebook or provide some code example for better understanding your problem. |
Beta Was this translation helpful? Give feedback.
-
|
@bibinwils — this was likely caused by version incompatibility in Colab circa 2021. A lot has improved since. Here's a clean setup that works today: Install: !pip install -q lightning torchmetricsCommon causes of GPU memory / slowness with metrics:
def training_step(self, batch, batch_idx):
self.train_metrics.update(y_hat, y) # cheap
return loss
def on_train_epoch_end(self):
self.log_dict(self.train_metrics.compute()) # compute once
self.train_metrics.reset()
from torchmetrics.classification import BinaryAUROC
auroc = BinaryAUROC(thresholds=200) # O(200) memory instead of O(N)
metrics = MetricCollection({...})
self.train_metrics = metrics.clone(prefix="train/")
self.val_metrics = metrics.clone(prefix="val/")If you're still hitting issues on the current version, a minimal reproducible example would help debug further. Docs: Overview |
Beta Was this translation helpful? Give feedback.
@bibinwils — this was likely caused by version incompatibility in Colab circa 2021. A lot has improved since. Here's a clean setup that works today:
Install:
Common causes of GPU memory / slowness with metrics:
metric(preds, target)every step —.forward()triggerscompute()each time. For metrics that store all predictions (AUROC, PRC), this is expensive. Use.update()+ epoch-level.compute():Not calling
.…