You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary
When logging metrics that return a tensor (e.g., per-class accuracy) at each step using self.log, the values logged for each class are incorrect. This happens when logging each tensor element separately inside a loop.
Environment
torchmetrics version: 1.8.0
pytorch-lightning version: 2.5.2
Python version: 3.12.3
OS: WSL/Ubuntu 24.04 (reproduced on CPU)
Expected behavior (intuitive)
If MulticlassAccuracy(..., average=None) returns, for example, tensor([0.8, 0.7]) at each step, logging each class value separately with self.log and on_epoch=True should produce the same results as updating the metric each step and computing at the end of the epoch.
Actual behavior (counterintuitive)
Per-class metrics logged in a loop using self.log do not match the values obtained by computing the metric at the end of the epoch. This leads to incorrect values in the logs.
This might be due to how the log method interacts with the internal state of the metric. I assume using self.log like this is proscribed, but it is very intuitive...
Minimal reproduction
Here is a MWE that reproduce the problem. Targets and predictions are hardcoded so we know what the correct result is.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
When logging metrics that return a tensor (e.g., per-class accuracy) at each step using
self.log
, the values logged for each class are incorrect. This happens when logging each tensor element separately inside a loop.Environment
torchmetrics
version: 1.8.0pytorch-lightning
version: 2.5.2Expected behavior (intuitive)
If
MulticlassAccuracy(..., average=None)
returns, for example,tensor([0.8, 0.7])
at each step, logging each class value separately withself.log
andon_epoch=True
should produce the same results as updating the metric each step and computing at the end of the epoch.Actual behavior (counterintuitive)
Per-class metrics logged in a loop using
self.log
do not match the values obtained by computing the metric at the end of the epoch. This leads to incorrect values in the logs.This might be due to how the log method interacts with the internal state of the metric. I assume using
self.log
like this is proscribed, but it is very intuitive...Minimal reproduction
Here is a MWE that reproduce the problem. Targets and predictions are hardcoded so we know what the correct result is.
Here is the output:
Steps to reproduce
STEP
values differ fromEPOCH
values.Questions
on_epoch=True
) proscribed?Beta Was this translation helpful? Give feedback.
All reactions