Skip to content

Commit 565bd24

Browse files
authored
[ML-7192] Update documentation for log streaming feature (#192)
Update documentation for HorovodRunner class to reflect changes to horovod task logging.
1 parent 61517f3 commit 565bd24

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

python/sparkdl/horovod/runner_base.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,16 +48,18 @@ def __init__(self, np):
4848
Accepted values are:
4949
5050
- If -1, this will spawn a subprocess on the driver node to run the Horovod job locally.
51-
Training stdout and stderr messages go to the notebook cell output.
52-
This is useful for debugging and we recommend testing your code under this mode first.
53-
However, be careful of heavy use of the Spark driver on a shared Databricks cluster.
51+
Training stdout and stderr messages go to the notebook cell output, and are also
52+
available in driver logs in case the cell output is truncated. This is useful for
53+
debugging and we recommend testing your code under this mode first. However, be
54+
careful of heavy use of the Spark driver on a shared Databricks cluster.
5455
- If >0, this will launch a Spark job with `np` tasks starting all together and run the
5556
Horovod job on the task nodes.
5657
It will wait until `np` task slots are available to launch the job.
5758
If `np` is greater than the total number of task slots on the cluster,
58-
the job will fail.
59-
Training stdout and stderr messages are redirected to the stderr stream of the first
60-
task, which you can find in the Spark UI.
59+
the job will fail. As of Databricks Runtime 5.4 ML, training stdout and stderr
60+
messages go to the notebook cell output. In the event that the cell output is
61+
truncated, full logs are available in stderr stream of task 0 under the 2nd spark
62+
job started by HorovodRunner, which you can find in the Spark UI.
6163
- If 0, this will use all task slots on the cluster to launch the job.
6264
"""
6365
self.num_processor = np

0 commit comments

Comments
 (0)