Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit cb645ca

Browse files
authored
Update external shuffle service docs
1 parent 982760c commit cb645ca

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

resource-managers/kubernetes/architecture-docs/external-shuffle-service.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,27 @@ layout: global
33
title: Kubernetes Implementation of the External Shuffle Service
44
---
55

6+
# External Shuffle Service
67

8+
The `KubernetesExternalShuffleService` was added to allow Spark to use Dynamic Allocation Mode when
9+
running in Kubernetes. The shuffle service is responsible for persisting shuffle files beyond the
10+
lifetime of the executors, allowing the number of executors to scale up and down without losing
11+
computation.
12+
13+
The implementation of choice is as a DaemonSet that runs a shuffle-service pod on each node.
14+
Shuffle-service pods and executors pods that land on the same node share disk using hostpath
15+
volumes. Spark requires that each executor must know the IP address of the shuffle-service pod that
16+
shares disk with it.
17+
18+
The user specifies the shuffle service pods they want executors of a particular SparkJob to use
19+
through two new properties:
20+
21+
* spark.kubernetes.shuffle.service.labels
22+
* spark.kubernetes.shuffle.namespace
23+
24+
KubernetesClusterSchedulerBackend is aware of shuffle service pods and the node corresponding to
25+
them in a particular namespace. It uses this data to configure the executor pods to connect with the
26+
shuffle services that are co-located with them on the same node.
27+
28+
There is additional logic in the `KubernetesExternalShuffleService` to watch the Kubernetes API,
29+
detect failures, and proactively cleanup files in those error cases.

0 commit comments

Comments
 (0)