This repository was archived by the owner on Jan 9, 2020. It is now read-only.
File tree Expand file tree Collapse file tree 1 file changed +23
-0
lines changed
resource-managers/kubernetes/architecture-docs Expand file tree Collapse file tree 1 file changed +23
-0
lines changed Original file line number Diff line number Diff line change @@ -3,4 +3,27 @@ layout: global
3
3
title : Kubernetes Implementation of the External Shuffle Service
4
4
---
5
5
6
+ # External Shuffle Service
6
7
8
+ The ` KubernetesExternalShuffleService ` was added to allow Spark to use Dynamic Allocation Mode when
9
+ running in Kubernetes. The shuffle service is responsible for persisting shuffle files beyond the
10
+ lifetime of the executors, allowing the number of executors to scale up and down without losing
11
+ computation.
12
+
13
+ The implementation of choice is as a DaemonSet that runs a shuffle-service pod on each node.
14
+ Shuffle-service pods and executors pods that land on the same node share disk using hostpath
15
+ volumes. Spark requires that each executor must know the IP address of the shuffle-service pod that
16
+ shares disk with it.
17
+
18
+ The user specifies the shuffle service pods they want executors of a particular SparkJob to use
19
+ through two new properties:
20
+
21
+ * spark.kubernetes.shuffle.service.labels
22
+ * spark.kubernetes.shuffle.namespace
23
+
24
+ KubernetesClusterSchedulerBackend is aware of shuffle service pods and the node corresponding to
25
+ them in a particular namespace. It uses this data to configure the executor pods to connect with the
26
+ shuffle services that are co-located with them on the same node.
27
+
28
+ There is additional logic in the ` KubernetesExternalShuffleService ` to watch the Kubernetes API,
29
+ detect failures, and proactively cleanup files in those error cases.
You can’t perform that action at this time.
0 commit comments