Logging tensor metric values at each step produces incorrect logged values #3218

altepo · 2025-08-11T15:46:46Z

altepo
Aug 11, 2025

Summary
When logging metrics that return a tensor (e.g., per-class accuracy) at each step using self.log, the values logged for each class are incorrect. This happens when logging each tensor element separately inside a loop.

Environment

torchmetrics version: 1.8.0
pytorch-lightning version: 2.5.2
Python version: 3.12.3
OS: WSL/Ubuntu 24.04 (reproduced on CPU)

Expected behavior (intuitive)
If MulticlassAccuracy(..., average=None) returns, for example, tensor([0.8, 0.7]) at each step, logging each class value separately with self.log and on_epoch=True should produce the same results as updating the metric each step and computing at the end of the epoch.

Actual behavior (counterintuitive)
Per-class metrics logged in a loop using self.log do not match the values obtained by computing the metric at the end of the epoch. This leads to incorrect values in the logs.

This might be due to how the log method interacts with the internal state of the metric. I assume using self.log like this is proscribed, but it is very intuitive...

Minimal reproduction

Here is a MWE that reproduce the problem. Targets and predictions are hardcoded so we know what the correct result is.

import torch
from torch.utils.data import DataLoader, TensorDataset
import lightning as L
from torchmetrics.classification import MulticlassAccuracy
import copy


class DummyModel(L.LightningModule):
    def __init__(self):
        super().__init__()
        self.step_metric = MulticlassAccuracy(num_classes=2, average=None)
        self.epoch_metric = MulticlassAccuracy(num_classes=2, average=None)

    def validation_step(self, batch, _):
        preds, targets = batch
        step_vals = self.step_metric(preds, targets)
        self.epoch_metric.update(preds, targets)
        for i, val in enumerate(step_vals):
            self.log(f"STEP val_acc_class_{i}", val, on_epoch=True)

    def on_validation_epoch_end(self):
        for i, val in enumerate(self.epoch_metric.compute()):
            self.log(f"EPOCH val_acc_class_{i}", val, prog_bar=False)


if __name__ == "__main__":
    targets = torch.tensor([[0, 0], [0, 0], [1, 1]]).flatten()
    preds = copy.deepcopy(targets)
    loader = DataLoader(TensorDataset(preds, targets), batch_size=2)
    L.Trainer(max_epochs=1, logger=False).validate(DummyModel(), loader)

Here is the output:

Steps to reproduce

Run the above script.
Observe that STEP values differ from EPOCH values.

Questions

Is logging tensor elements at each step (with on_epoch=True) proscribed?
If so, should it raise a warning or error instead of silently producing incorrect values?
Is there a recommended way to log per-class metrics at each step?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logging tensor metric values at each step produces incorrect logged values #3218

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Logging tensor metric values at each step produces incorrect logged values #3218

Uh oh!

altepo Aug 11, 2025

Replies: 0 comments

altepo
Aug 11, 2025