You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
Copy file name to clipboardExpand all lines: resource-managers/kubernetes/architecture-docs/scheduler-backend.md
+44Lines changed: 44 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,3 +3,47 @@ layout: global
3
3
title: Kubernetes Implementation of the Spark Scheduler Backend
4
4
---
5
5
6
+
# Scheduler Backend
7
+
8
+
The general idea is to run Spark drivers and executors inside Kubernetes [Pods](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
9
+
Pods are a co-located and co-scheduled group of one or more containers run in a shared context. The main component is KubernetesClusterSchedulerBackend,
10
+
an implementation of CoarseGrainedSchedulerBackend, which manages allocating and destroying executors via the Kubernetes API.
11
+
There are auxiliary and optional components: `ResourceStagingServer` and `KubernetesExternalShuffleService`, which serve specific purposes described further below.
12
+
13
+
The scheduler backend is invoked in the driver associated with a particular job. The driver may run outside the cluster (client mode) or within (cluster mode).
14
+
The scheduler backend manages [pods](http://kubernetes.io/docs/user-guide/pods/) for each executor.
15
+
The executor code is running within a Kubernetes pod, but remains unmodified and unaware of the orchestration layer.
16
+
When a job is running, the scheduler backend configures and creates executor pods with the following properties:
17
+
18
+
- The pod's container runs a pre-built Docker image containing a Spark distribution (with Kubernetes integration) and
19
+
invokes the Java runtime with the CoarseGrainedExecutorBackend main class.
20
+
- The scheduler backend specifies environment variables on the executor pod to configure its runtime, p
21
+
articularly for its JVM options, number of cores, heap size, and the driver's hostname.
22
+
- The executor container has [resource limits and requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
23
+
that are set in accordance to the resource limits specified in the Spark configuration (executor.cores and executor.memory in the application's SparkConf)
24
+
- The executor pods may also be launched into a particular [Kubernetes namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/%5C),
25
+
or target a particular subset of nodes in the Kubernetes cluster, based on the Spark configuration supplied.
26
+
27
+
## Requesting Executors
28
+
29
+
Spark requests for new executors through the `doRequestTotalExecutors(numExecutors: Int)` method.
30
+
The scheduler backend keeps track of the request made by Spark core for the number of executors.
31
+
32
+
A separate kubernetes-pod-allocator thread handles the creation of new executor pods with appropriate throttling and monitoring.
33
+
This indirection is required because the Kubernetes API Server accepts requests for new executor pods optimistically, with the
34
+
anticipation of being able to eventually run them. However, it is undesirable to have a very large number of pods that cannot be
35
+
scheduled and stay pending within the cluster. Hence, the kubernetes-pod-allocator uses the Kubernetes API to make a decision to
36
+
submit new requests for executors based on whether previous pod creation requests have completed. This gives us control over how
37
+
fast a job scales up (which can be configured), and helps prevent Spark jobs from DOS-ing the Kubernetes API server with pod creation requests.
38
+
39
+
## Destroying Executors
40
+
41
+
Spark requests deletion of executors through the `doKillExecutors(executorIds: List[String])`
42
+
method.
43
+
44
+
The inverse behavior is required in the implementation of doKillExecutors(). When the executor
45
+
allocation manager desires to remove executors from the application, the scheduler should find the
46
+
pods that are running the appropriate executors, and tell the API server to stop these pods.
47
+
It's worth noting that this code does not have to decide on the executors that should be
48
+
removed. When `doKillExecutors()` is called, the executors that are to be removed have already been
49
+
selected by the CoarseGrainedSchedulerBackend and ExecutorAllocationManager.
0 commit comments