@@ -48,16 +48,18 @@ def __init__(self, np):
4848 Accepted values are:
4949
5050 - If -1, this will spawn a subprocess on the driver node to run the Horovod job locally.
51- Training stdout and stderr messages go to the notebook cell output.
52- This is useful for debugging and we recommend testing your code under this mode first.
53- However, be careful of heavy use of the Spark driver on a shared Databricks cluster.
51+ Training stdout and stderr messages go to the notebook cell output, and are also
52+ available in driver logs in case the cell output is truncated. This is useful for
53+ debugging and we recommend testing your code under this mode first. However, be
54+ careful of heavy use of the Spark driver on a shared Databricks cluster.
5455 - If >0, this will launch a Spark job with `np` tasks starting all together and run the
5556 Horovod job on the task nodes.
5657 It will wait until `np` task slots are available to launch the job.
5758 If `np` is greater than the total number of task slots on the cluster,
58- the job will fail.
59- Training stdout and stderr messages are redirected to the stderr stream of the first
60- task, which you can find in the Spark UI.
59+ the job will fail. As of Databricks Runtime 5.4 ML, training stdout and stderr
60+ messages go to the notebook cell output. In the event that the cell output is
61+ truncated, full logs are available in stderr stream of task 0 under the 2nd spark
62+ job started by HorovodRunner, which you can find in the Spark UI.
6163 - If 0, this will use all task slots on the cluster to launch the job.
6264 """
6365 self .num_processor = np
0 commit comments