You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Signed-off-by: Stanislav Kirdey <stan@inflection.ai>
Signed-off-by: Stan Kirdey <stan@inflection.ai>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Copy file name to clipboardExpand all lines: ray.sub
+40-4Lines changed: 40 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -31,8 +31,8 @@ maybe_gres_arg() {
31
31
# Check if any nodes in the partition have GRES configured
32
32
# Assumes a homogeneous allocation (not a heterogeneous job)
33
33
if sinfo -p $SLURM_JOB_PARTITION -h -o "%G"| grep -q "gpu:";then
34
-
# Do a quick assert here that gpus:8 == gpus:$GPUS_PER_NODE. It is probably a user error if someone isn't using GPUS_PER_NODE=8 on our clusters if it supports --gres=gpu:8.
# Do a quick assert here that gpus:8 == gpus:$GPUS_PER_NODE. It is probably a user error if someone isn't using GPUS_PER_NODE=8 on our clusters if it supports --gres=gpu:8 or gpu:a100:8
echo"Error: GPUS_PER_NODE=$GPUS_PER_NODE but GRES detected is $(sinfo -p $SLURM_JOB_PARTITION -h -o "%G"| grep "gpu:") meaning GPUS_PER_NODE is not set to fully claim the GPUs on the nodes.">&2
0 commit comments