@@ -15,7 +15,7 @@ Kubernetes scheduler that has been added to Spark.
15
15
[ kubectl] ( https://kubernetes.io/docs/user-guide/prereqs/ ) . If you do not already have a working Kubernetes cluster,
16
16
you may setup a test cluster on your local machine using
17
17
[ minikube] ( https://kubernetes.io/docs/getting-started-guides/minikube/ ) .
18
- * We recommend using the latest releases of minikube be updated to the most recent version with the DNS addon enabled.
18
+ * We recommend using the latest release of minikube with the DNS addon enabled.
19
19
* You must have appropriate permissions to list, create, edit and delete
20
20
[ pods] ( https://kubernetes.io/docs/user-guide/pods/ ) in your cluster. You can verify that you can list these resources
21
21
by running ` kubectl auth can-i <list|create|edit|delete> pods ` .
@@ -28,12 +28,13 @@ by running `kubectl auth can-i <list|create|edit|delete> pods`.
28
28
<img src =" img/k8s-cluster-mode.png " title =" Spark cluster components " alt =" Spark cluster components " />
29
29
</p >
30
30
31
- spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
31
+ <code >spark-submit</code > can be directly used to submit a Spark application to a Kubernetes cluster.
32
+ The submission mechanism works as follows:
32
33
33
- * Spark creates a spark driver running within a [ Kubernetes pod] ( https://kubernetes.io/docs/concepts/workloads/pods/pod/ ) .
34
+ * Spark creates a Spark driver running within a [ Kubernetes pod] ( https://kubernetes.io/docs/concepts/workloads/pods/pod/ ) .
34
35
* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.
35
36
* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists
36
- logs and remains in "completed" state in the Kubernetes API till it's eventually garbage collected or manually cleaned up.
37
+ logs and remains in "completed" state in the Kubernetes API until it's eventually garbage collected or manually cleaned up.
37
38
38
39
Note that in the completed state, the driver pod does * not* use any computational or memory resources.
39
40
@@ -54,7 +55,7 @@ and built for your usage.
54
55
55
56
You may build these docker images from sources.
56
57
There is a script, ` sbin/build-push-docker-images.sh ` that you can use to build and push
57
- customized spark distribution images consisting of all the above components.
58
+ customized Spark distribution images consisting of all the above components.
58
59
59
60
Example usage is:
60
61
@@ -95,14 +96,14 @@ kubectl cluster-info
95
96
Kubernetes master is running at http://127.0.0.1:6443
96
97
```
97
98
98
- In the above example, the specific Kubernetes cluster can be used with spark submit by specifying
99
+ In the above example, the specific Kubernetes cluster can be used with < code > spark- submit</ code > by specifying
99
100
` --master k8s://http://127.0.0.1:6443 ` as an argument to spark-submit. Additionally, it is also possible to use the
100
101
authenticating proxy, ` kubectl proxy ` to communicate to the Kubernetes API.
101
102
102
103
The local proxy can be started by:
103
104
104
105
``` bash
105
- kubectl proxy
106
+ kubectl proxy
106
107
```
107
108
108
109
If the local proxy is running at localhost:8001, ` --master k8s://http://127.0.0.1:8001 ` can be used as the argument to
@@ -123,15 +124,15 @@ take actions.
123
124
124
125
### Accessing Logs
125
126
126
- Logs can be accessed using the kubernetes API and the ` kubectl ` CLI. When a Spark application is running, it's possible
127
+ Logs can be accessed using the Kubernetes API and the ` kubectl ` CLI. When a Spark application is running, it's possible
127
128
to stream logs from the application using:
128
129
129
130
``` bash
130
131
kubectl -n=< namespace> logs -f < driver-pod-name>
131
132
```
132
133
133
134
The same logs can also be accessed through the
134
- [ kubernetes dashboard] ( https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/ ) if installed on
135
+ [ Kubernetes dashboard] ( https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/ ) if installed on
135
136
the cluster.
136
137
137
138
### Accessing Driver UI
@@ -143,13 +144,13 @@ The UI associated with any application can be accessed locally using
143
144
kubectl port-forward < driver-pod-name> 4040:4040
144
145
```
145
146
146
- Then, the spark driver UI can be accessed on ` http://localhost:4040 ` .
147
+ Then, the Spark driver UI can be accessed on ` http://localhost:4040 ` .
147
148
148
149
### Debugging
149
150
150
151
There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the
151
152
connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there
152
- are errors during the running of the application, often, the best way to investigate may be through the kubernetes CLI.
153
+ are errors during the running of the application, often, the best way to investigate may be through the Kubernetes CLI.
153
154
154
155
To get some basic information about the scheduling decisions made around the driver pod, you can run:
155
156
@@ -165,15 +166,15 @@ kubectl logs <spark-driver-pod>
165
166
166
167
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
167
168
application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
168
- the spark application.
169
+ the Spark application.
169
170
170
171
## Kubernetes Features
171
172
172
173
### Namespaces
173
174
174
175
Kubernetes has the concept of [ namespaces] ( https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ ) .
175
176
Namespaces are ways to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
176
- use namespaces to launch spark applications. This is through the ` --conf spark.kubernetes.namespace` argument to spark-submit .
177
+ use namespaces to launch Spark applications. This can be made use of through the ` spark.kubernetes.namespace ` configuration .
177
178
178
179
Kubernetes allows using [ ResourceQuota] ( https://kubernetes.io/docs/concepts/policy/resource-quotas/ ) to set limits on
179
180
resources, number of objects, etc on individual namespaces. Namespaces and ResourceQuota can be used in combination by
@@ -198,7 +199,7 @@ that allows driver pods to create pods and services under the default Kubernetes
198
199
service account that has the right role granted. Spark on Kubernetes supports specifying a custom service account to
199
200
be used by the driver pod through the configuration property
200
201
` spark.kubernetes.authenticate.driver.serviceAccountName=<service account name> ` . For example to make the driver pod
201
- to use the ` spark ` service account, a user simply adds the following option to the ` spark-submit ` command:
202
+ use the ` spark ` service account, a user simply adds the following option to the ` spark-submit ` command:
202
203
203
204
```
204
205
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
@@ -272,6 +273,7 @@ specific to Spark on Kubernetes.
272
273
<td >
273
274
Docker image to use for the driver. Specify this using the standard
274
275
<a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
276
+ This configuration is required and must be provided by the user.
275
277
</td >
276
278
</tr >
277
279
<tr >
@@ -280,6 +282,7 @@ specific to Spark on Kubernetes.
280
282
<td >
281
283
Docker image to use for the executors. Specify this using the standard
282
284
<a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
285
+ This configuration is required and must be provided by the user.
283
286
</td >
284
287
</tr >
285
288
<tr >
@@ -365,7 +368,7 @@ specific to Spark on Kubernetes.
365
368
<td ><code >spark.kubernetes.authenticate.driver.oauthToken</code ></td >
366
369
<td >(none)</td >
367
370
<td >
368
- OAuth token to use when authenticating against the against the Kubernetes API server from the driver pod when
371
+ OAuth token to use when authenticating against the Kubernetes API server from the driver pod when
369
372
requesting executors. Note that unlike the other authentication options, this must be the exact string value of
370
373
the token to use for the authentication. This token value is uploaded to the driver pod. If this is specified, it is
371
374
highly recommended to set up TLS for the driver submission server, as this value is sensitive information that would
@@ -483,15 +486,17 @@ specific to Spark on Kubernetes.
483
486
<td ><code >spark.kubernetes.driver.secrets.[SecretName]</code ></td >
484
487
<td >(none)</td >
485
488
<td >
486
- Mounts the Kubernetes secret named <code>SecretName</code> onto the path specified by the value
489
+ Mounts the [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/)
490
+ named <code>SecretName</code> onto the path specified by the value
487
491
in the driver Pod. The user can specify multiple instances of this for multiple secrets.
488
492
</td >
489
493
</tr >
490
494
<tr >
491
495
<td ><code >spark.kubernetes.executor.secrets.[SecretName]</code ></td >
492
496
<td >(none)</td >
493
497
<td >
494
- Mounts the Kubernetes secret named <code>SecretName</code> onto the path specified by the value
498
+ Mounts the [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/)
499
+ named <code>SecretName</code> onto the path specified by the value
495
500
in the executor Pods. The user can specify multiple instances of this for multiple secrets.
496
501
</td >
497
502
</tr >
0 commit comments