File tree Expand file tree Collapse file tree 1 file changed +17
-0
lines changed
src/lightning/pytorch/callbacks Expand file tree Collapse file tree 1 file changed +17
-0
lines changed Original file line number Diff line number Diff line change @@ -45,6 +45,23 @@ class DeviceStatsMonitor(Callback):
4545 ModuleNotFoundError:
4646 If ``psutil`` is not installed and CPU stats are monitored.
4747
48+ Logged Metrics:
49+ Device statistics are logged with keys prefixed as
50+ ``DeviceStatsMonitor.{hook_name}/{base_metric_name}`` (e.g.,
51+ ``DeviceStatsMonitor.on_train_batch_start/cpu_percent``). The source of these
52+ metrics depends on the active :class:`~lightning.pytorch.accelerators.accelerator.Accelerator`
53+ and the ``cpu_stats`` flag.
54+
55+ CPU (via ``psutil``): Logs ``cpu_percent``, ``cpu_vm_percent``, ``cpu_swap_percent``.
56+ All are percentages (%).
57+ CUDA GPU (via :func:`torch.cuda.memory_stats`): Logs detailed memory statistics from
58+ PyTorch's allocator (e.g., ``allocated_bytes.all.current``, ``num_ooms``; all in Bytes).
59+ GPU compute utilization is not logged by default.
60+ Other Accelerators (e.g., TPU, MPS): Logs device-specific stats.
61+ - TPU example: ``avg. free memory (MB)``.
62+ - MPS example: ``mps.current_allocated_bytes``.
63+ Observe logs or check accelerator documentation for details.
64+
4865 Example::
4966
5067 from lightning import Trainer
You can’t perform that action at this time.
0 commit comments