Skip to content

Redundant Host & Device Synchronizations #21232

@alpha0422

Description

@alpha0422

Outline & Motivation

We find that PyTorch Lightning can introduce some redundant host & device synchronizations while trying to optimize the performance of some important DL workloads. These syncs affects performance and blocks the usage of CUDA graph.

Progress Bar Metric

As shown in logger_connector/result.py#L491-L493:

# populate progress_bar metrics. convert tensors to numbers
if result_metric.meta.prog_bar:
    metrics["pbar"][forked_name] = convert_tensors_to_scalars(value)

If a metric is logged with prog_bar=True, e.g. pl_module.log('lr', lr, prog_bar=True), then no matter if user enables progress bar or not, the metric tensor is always converted to scalar, which introduces a host & device sync.

It's better to avoid this conversion if the user don't want to show progress bar, i.e. when trainer.enable_progress_bar = False. Then we can avoid such synchronizations.

Best Metric Device

The metric tensor is always put on the GPU when training with a CUDA device. Thus whenever the metric is retrieved there is a device to host synchronization.

However, I'd propose putting the metric on the device exactly as the value/tensor update it. For example:

  • User is likely to update metric global_step using a scalar on CPU, so the metric for global_step should be on CPU side;
  • User is likely to update metric loss using a tensor on GPU, so the metric for loss should be on GPU side;

In this way, updating global_step metric won't introduce a host & device synchronization since the metric is on CPU instead of GPU now. And in future if the user retrieves global_step metric, there's also no sync.

The original logic is at core/module.py#L657-L661:

value = (
    value.clone().detach()
    if isinstance(value, Tensor)
    else torch.tensor(value, device=self.device, dtype=_get_default_dtype())
)

Pitch

No response

Additional context

No response

cc @lantiga @justusschock

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions