Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit 437eb89

Browse files
foxisherikerlandson
authored andcommitted
Updated with documentation (#430)
Direct copy of revised design doc
1 parent 982760c commit 437eb89

File tree

1 file changed

+44
-0
lines changed

1 file changed

+44
-0
lines changed

resource-managers/kubernetes/architecture-docs/scheduler-backend.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,47 @@ layout: global
33
title: Kubernetes Implementation of the Spark Scheduler Backend
44
---
55

6+
# Scheduler Backend
7+
8+
The general idea is to run Spark drivers and executors inside Kubernetes [Pods](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
9+
Pods are a co-located and co-scheduled group of one or more containers run in a shared context. The main component is KubernetesClusterSchedulerBackend,
10+
an implementation of CoarseGrainedSchedulerBackend, which manages allocating and destroying executors via the Kubernetes API.
11+
There are auxiliary and optional components: `ResourceStagingServer` and `KubernetesExternalShuffleService`, which serve specific purposes described further below.
12+
13+
The scheduler backend is invoked in the driver associated with a particular job. The driver may run outside the cluster (client mode) or within (cluster mode).
14+
The scheduler backend manages [pods](http://kubernetes.io/docs/user-guide/pods/) for each executor.
15+
The executor code is running within a Kubernetes pod, but remains unmodified and unaware of the orchestration layer.
16+
When a job is running, the scheduler backend configures and creates executor pods with the following properties:
17+
18+
- The pod's container runs a pre-built Docker image containing a Spark distribution (with Kubernetes integration) and
19+
invokes the Java runtime with the CoarseGrainedExecutorBackend main class.
20+
- The scheduler backend specifies environment variables on the executor pod to configure its runtime, p
21+
articularly for its JVM options, number of cores, heap size, and the driver's hostname.
22+
- The executor container has [resource limits and requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
23+
that are set in accordance to the resource limits specified in the Spark configuration (executor.cores and executor.memory in the application's SparkConf)
24+
- The executor pods may also be launched into a particular [Kubernetes namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/%5C),
25+
or target a particular subset of nodes in the Kubernetes cluster, based on the Spark configuration supplied.
26+
27+
## Requesting Executors
28+
29+
Spark requests for new executors through the `doRequestTotalExecutors(numExecutors: Int)` method.
30+
The scheduler backend keeps track of the request made by Spark core for the number of executors.
31+
32+
A separate kubernetes-pod-allocator thread handles the creation of new executor pods with appropriate throttling and monitoring.
33+
This indirection is required because the Kubernetes API Server accepts requests for new executor pods optimistically, with the
34+
anticipation of being able to eventually run them. However, it is undesirable to have a very large number of pods that cannot be
35+
scheduled and stay pending within the cluster. Hence, the kubernetes-pod-allocator uses the Kubernetes API to make a decision to
36+
submit new requests for executors based on whether previous pod creation requests have completed. This gives us control over how
37+
fast a job scales up (which can be configured), and helps prevent Spark jobs from DOS-ing the Kubernetes API server with pod creation requests.
38+
39+
## Destroying Executors
40+
41+
Spark requests deletion of executors through the `doKillExecutors(executorIds: List[String])`
42+
method.
43+
44+
The inverse behavior is required in the implementation of doKillExecutors(). When the executor
45+
allocation manager desires to remove executors from the application, the scheduler should find the
46+
pods that are running the appropriate executors, and tell the API server to stop these pods.
47+
It's worth noting that this code does not have to decide on the executors that should be
48+
removed. When `doKillExecutors()` is called, the executors that are to be removed have already been
49+
selected by the CoarseGrainedSchedulerBackend and ExecutorAllocationManager.

0 commit comments

Comments
 (0)