-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
When logger.log_metrics(metrics) is called with a CometLogger, metrics may be modified in-place. This can lead to confusing errors. E.g. if the user does
def training_step(self, batch, batch_idx):
losses = self._get_losses(batch)
self.logger.log_metrics(losses)
return lossesthen losses will have all the tensors moved to the CPU and their gradients detached, leading to an error like RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn when backprop is attempted.
None of the other loggers change metrics in-place when log_metrics is called. All of them except neptune say that they just accept metrics: Dict[str, float], though some others (e.g. the tensorboard logger) have code to handle torch.Tensors or other types as well.
The CSVLogger uses the following for handling tensors:
def _handle_value(value):
if isinstance(value, torch.Tensor):
return value.item()
return value
...
metrics = {k: _handle_value(v) for k, v in metrics_dict.items()}The TensorBoardLogger similarly has
for k, v in metrics.items():
if isinstance(v, torch.Tensor):
v = v.item()
...
self.experiment.add_scalar(k, v, step)In the CometLogger, the current tensor conversion code is
for key, val in metrics.items():
if is_tensor(val):
metrics[key] = val.cpu().detach()but then the entire metrics dictionary is copied later in the function anyway, so it doesn't really make sense to do in-place modification then copy everything.
I'm happy to submit a PR to fix this so that the CometLogger doesn't modify the original metrics dictionary. I just wanted to ask for a couple of opinions before changing things:
- Should I keep the current tensor conversion behavior for
CometLogger(val.cpu().detach()) or switch to usingval.item()? My preference would be the latter, though this does change the behavior (see at the end). - Should I update the other loggers to all accept
metrics: Dict[str, Union[float, torch.Tensor]]and have them all use the same method (probably imported fromloggers/base.py) to convert to aDict[str, float]? -
- I don't know the other loggers, so I'm not sure if tensors are actually not supported or if the type annotation isn't precise and the conversion is happening in third-party code
val.cpu().detach() vs val.item()
- Comet sort of has support for tensors with >1 element, so using the first method will make logging such tensors valid while the second method would throw an error. However, I don't think anybody would be using this behavior on purpose. If you do
logger.log_metrics({"test": torch.tensor([1.0, 10.0])}), you getCOMET WARNING: Cannot safely convert array([ 1., 10.], dtype=float32) object to a scalar value, using its string representation for logging. The metric itself doesn't even appear in the web interface for CometML, so I assume you can only access it if you query for it directly through their API.