Skip to content

Commit 59364f4

Browse files
[release-2.0] fix(runtimes): Set numProcPerNode: 1 in DeepSpeed Runtime (#2863)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
1 parent 5e344ad commit 59364f4

File tree

1 file changed

+1
-3
lines changed

1 file changed

+1
-3
lines changed

manifests/base/runtimes/deepspeed_distributed.yaml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,7 @@ spec:
88
mlPolicy:
99
numNodes: 1
1010
mpi:
11-
# TODO (andreyvelich): Change num proc to 1 and remove container resources after we
12-
# allow to override it via TrainJob APIs.
13-
numProcPerNode: 4
11+
numProcPerNode: 1
1412
mpiImplementation: OpenMPI
1513
sshAuthMountPath: /home/mpiuser/.ssh
1614
runLauncherAsNode: true

0 commit comments

Comments
 (0)