Cuda DeviceStatsMonitor #16072
Unanswered
peterbjorgensen
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am using DeviceStatsMonitor on a cuda device and would like to track the GPU memory usage to visualize a potential memory leak, but I don't know what the different metrics mean and what their units are when I look in the Tensorboard. Is there a single number similar to
nvidia-smi
GPU memory usage and another for the GPU utilisation (in percent)?A simplified view like the one in wandb would be perfect: https://lambdalabs.com/blog/weights-and-bias-gpu-cpu-utilization
Beta Was this translation helpful? Give feedback.
All reactions