Skip to content

Commit 871fabe

Browse files
authored
[doc][dask] Update notes about k8s. (dmlc#10271)
1 parent 75fe2ff commit 871fabe

File tree

1 file changed

+55
-17
lines changed

1 file changed

+55
-17
lines changed

doc/tutorials/dask.rst

Lines changed: 55 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -237,53 +237,91 @@ For most of the use cases with GPUs, the `Dask-CUDA <https://docs.rapids.ai/api/
237237
Working with other clusters
238238
***************************
239239

240-
Using Dask's ``LocalCluster`` is convenient for getting started quickly on a single-machine. Once you're ready to scale your work, though, there are a number of ways to deploy Dask on a distributed cluster. You can use `Dask-CUDA <https://docs.rapids.ai/api/dask-cuda/stable/quickstart.html>`_, for example, for GPUs and you can use Dask Cloud Provider to `deploy Dask clusters in the cloud <https://docs.dask.org/en/stable/deploying.html#cloud>`_. See the `Dask documentation for a more comprehensive list <https://docs.dask.org/en/stable/deploying.html#distributed-computing>`_.
240+
Using Dask's ``LocalCluster`` is convenient for getting started quickly on a local machine. Once you're ready to scale your work, though, there are a number of ways to deploy Dask on a distributed cluster. You can use `Dask-CUDA <https://docs.rapids.ai/api/dask-cuda/stable/quickstart.html>`_, for example, for GPUs and you can use Dask Cloud Provider to `deploy Dask clusters in the cloud <https://docs.dask.org/en/stable/deploying.html#cloud>`_. See the `Dask documentation for a more comprehensive list <https://docs.dask.org/en/stable/deploying.html#distributed-computing>`_.
241241

242242
In the example below, a ``KubeCluster`` is used for `deploying Dask on Kubernetes <https://docs.dask.org/en/stable/deploying-kubernetes.html>`_:
243243

244244
.. code-block:: python
245245
246-
from dask_kubernetes import KubeCluster # Need to install the ``dask-kubernetes`` package
246+
from dask_kubernetes.operator import KubeCluster # Need to install the ``dask-kubernetes`` package
247+
from dask_kubernetes.operator.kubecluster.kubecluster import CreateMode
248+
247249
from dask.distributed import Client
248250
from xgboost import dask as dxgb
249-
import dask
250251
import dask.array as da
251252
252-
dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer",
253-
"kubernetes.scheduler-service-wait-timeout": 360,
254-
"distributed.comm.timeouts.connect": 360})
255-
256253
257254
def main():
258-
'''Connect to a remote kube cluster with GPU nodes and run training on it.'''
255+
'''Connect to a remote kube cluster with GPU nodes and run training on it.'''
259256
m = 1000
260257
n = 10
261258
kWorkers = 2 # assuming you have 2 GPU nodes on that cluster.
262259
# You need to work out the worker-spec yourself. See document in dask_kubernetes for
263260
# its usage. Here we just want to show that XGBoost works on various clusters.
264-
cluster = KubeCluster.from_yaml('worker-spec.yaml', deploy_mode='remote')
265-
cluster.scale(kWorkers) # scale to use all GPUs
266261
267-
with Client(cluster) as client:
268-
X = da.random.random(size=(m, n), chunks=100)
269-
y = da.random.random(size=(m, ), chunks=100)
262+
# See notes below for why we use pre-allocated cluster.
263+
with KubeCluster(
264+
name="xgboost-test",
265+
image="my-image-name:latest",
266+
n_workers=kWorkers,
267+
create_mode=CreateMode.CONNECT_ONLY,
268+
shutdown_on_close=False,
269+
) as cluster:
270+
with Client(cluster) as client:
271+
X = da.random.random(size=(m, n), chunks=100)
272+
y = X.sum(axis=1)
270273
271-
regressor = dxgb.DaskXGBRegressor(n_estimators=10, missing=0.0)
272-
regressor.client = client
273-
regressor.set_params(tree_method='hist', device="cuda")
274-
regressor.fit(X, y, eval_set=[(X, y)])
274+
regressor = dxgb.DaskXGBRegressor(n_estimators=10, missing=0.0)
275+
regressor.client = client
276+
regressor.set_params(tree_method='hist', device="cuda")
277+
regressor.fit(X, y, eval_set=[(X, y)])
275278
276279
277280
if __name__ == '__main__':
278281
# Launch the kube cluster on somewhere like GKE, then run this as client process.
279282
# main function will connect to that cluster and start training xgboost model.
280283
main()
281284
285+
282286
Different cluster classes might have subtle differences like network configuration, or
283287
specific cluster implementation might contains bugs that we are not aware of. Open an
284288
issue if such case is found and there's no documentation on how to resolve it in that
285289
cluster implementation.
286290

291+
An interesting aspect of the Kubernetes cluster is that the pods may become available
292+
after the Dask workflow has begun, which can cause issues with distributed XGBoost since
293+
XGBoost expects the nodes used by input data to remain unchanged during training. To use
294+
Kubernetes clusters, it is necessary to wait for all the pods to be online before
295+
submitting XGBoost tasks. One can either create a wait function in Python or simply
296+
pre-allocate a cluster with k8s tools (like ``kubectl``) before running dask workflows. To
297+
pre-allocate a cluster, we can first generate the cluster spec using dask kubernetes:
298+
299+
.. code-block:: python
300+
301+
import json
302+
303+
from dask_kubernetes.operator import make_cluster_spec
304+
305+
spec = make_cluster_spec(name="xgboost-test", image="my-image-name:latest", n_workers=16)
306+
with open("cluster-spec.json", "w") as fd:
307+
json.dump(spec, fd, indent=2)
308+
309+
.. code-block:: sh
310+
311+
kubectl apply -f ./cluster-spec.json
312+
313+
314+
Check whether the pods are available:
315+
316+
.. code-block:: sh
317+
318+
kubectl get pods
319+
320+
Once all pods have been initialized, the Dask XGBoost workflow can be run, as in the
321+
previous example. It is important to ensure that the cluster sets the parameter
322+
``create_mode=CreateMode.CONNECT_ONLY`` and optionally ``shutdown_on_close=False`` if you
323+
do not want to shut down the cluster after a single job.
324+
287325
*******
288326
Threads
289327
*******

0 commit comments

Comments
 (0)