Skip to content

CometLogger can modify logged metrics in-place  #7021

@neighthan

Description

@neighthan

When logger.log_metrics(metrics) is called with a CometLogger, metrics may be modified in-place. This can lead to confusing errors. E.g. if the user does

def training_step(self, batch, batch_idx):
    losses = self._get_losses(batch)
    self.logger.log_metrics(losses)
    return losses

then losses will have all the tensors moved to the CPU and their gradients detached, leading to an error like RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn when backprop is attempted.

None of the other loggers change metrics in-place when log_metrics is called. All of them except neptune say that they just accept metrics: Dict[str, float], though some others (e.g. the tensorboard logger) have code to handle torch.Tensors or other types as well.

The CSVLogger uses the following for handling tensors:

def _handle_value(value):
    if isinstance(value, torch.Tensor):
        return value.item()
    return value
...
metrics = {k: _handle_value(v) for k, v in metrics_dict.items()}

The TensorBoardLogger similarly has

for k, v in metrics.items():
    if isinstance(v, torch.Tensor):
        v = v.item()
    ...
    self.experiment.add_scalar(k, v, step)

In the CometLogger, the current tensor conversion code is

for key, val in metrics.items():
  if is_tensor(val):
    metrics[key] = val.cpu().detach()

but then the entire metrics dictionary is copied later in the function anyway, so it doesn't really make sense to do in-place modification then copy everything.

I'm happy to submit a PR to fix this so that the CometLogger doesn't modify the original metrics dictionary. I just wanted to ask for a couple of opinions before changing things:

  1. Should I keep the current tensor conversion behavior for CometLogger (val.cpu().detach()) or switch to using val.item()? My preference would be the latter, though this does change the behavior (see at the end).
  2. Should I update the other loggers to all accept metrics: Dict[str, Union[float, torch.Tensor]] and have them all use the same method (probably imported from loggers/base.py) to convert to a Dict[str, float]?
    • I don't know the other loggers, so I'm not sure if tensors are actually not supported or if the type annotation isn't precise and the conversion is happening in third-party code

val.cpu().detach() vs val.item()

  • Comet sort of has support for tensors with >1 element, so using the first method will make logging such tensors valid while the second method would throw an error. However, I don't think anybody would be using this behavior on purpose. If you do logger.log_metrics({"test": torch.tensor([1.0, 10.0])}), you get COMET WARNING: Cannot safely convert array([ 1., 10.], dtype=float32) object to a scalar value, using its string representation for logging. The metric itself doesn't even appear in the web interface for CometML, so I assume you can only access it if you query for it directly through their API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomershelp wantedOpen to be worked onloggerRelated to the Loggerspriority: 2Low priority task

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions