Skip to content

Commit 0412bf7

Browse files
committed
Specify nodes for gpu metrics collection and split data to each rank
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
1 parent 5ff96f7 commit 0412bf7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

nemo_run/core/execution/slurm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -560,7 +560,7 @@ def get_nsys_entrypoint(self) -> str:
560560
launcher = self.get_launcher()
561561
entrypoint, postfix = "nsys", ""
562562
if launcher.nsys_gpu_metrics:
563-
entrypoint = 'bash -c \'GPU_METRICS_FLAG=""; if [ "$SLURM_PROCID" -eq 0 ]; then GPU_METRICS_FLAG="--gpu-metrics-devices=all"; fi; nsys'
563+
entrypoint = 'bash -c \'GPU_METRICS_FLAG=""; if echo "${GPU_METRICS_NODES}" | grep -q -w "${SLURM_NODEID}"; then GPU_METRICS_FLAG="--gpu-metrics-devices=${SLURM_LOCALID}"; fi; nsys'
564564
postfix = "'"
565565
return (entrypoint, postfix)
566566

0 commit comments

Comments
 (0)