Skip to content

Commit 572edd8

Browse files
Hanyu Cuijkbradley
authored andcommitted
Adds runtime version info for single-node distributed training (#201)
1 parent 329507f commit 572edd8

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

python/sparkdl/horovod/runner_base.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,12 @@ def __init__(self, np):
4747
which maps to a GPU on a GPU cluster or a CPU core on a CPU cluster.
4848
Accepted values are:
4949
50-
- If <0, this will spawn -np subprocesses on the driver node to run Horovod locally.
50+
- If <0, this will spawn `-np` subprocesses on the driver node to run Horovod locally.
5151
Training stdout and stderr messages go to the notebook cell output, and are also
5252
available in driver logs in case the cell output is truncated. This is useful for
5353
debugging and we recommend testing your code under this mode first. However, be
5454
careful of heavy use of the Spark driver on a shared Databricks cluster.
55+
Note that `np < -1` is only supported on Databricks Runtime 5.5 ML and above.
5556
- If >0, this will launch a Spark job with `np` tasks starting all together and run the
5657
Horovod job on the task nodes.
5758
It will wait until `np` task slots are available to launch the job.

0 commit comments

Comments
 (0)