Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit b18d1ba

Browse files
committed
Addressed comments (round 2)
1 parent 14bee00 commit b18d1ba

File tree

3 files changed

+24
-19
lines changed

3 files changed

+24
-19
lines changed

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ options for deployment:
113113
* [Mesos](running-on-mesos.html): deploy a private cluster using
114114
[Apache Mesos](http://mesos.apache.org)
115115
* [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN)
116-
* [Kubernetes (experimental)](running-on-kubernetes.html): deploy Spark on top of Kubernetes
116+
* [Kubernetes](running-on-kubernetes.html): deploy Spark on top of Kubernetes
117117

118118
**Other Documents:**
119119

docs/running-on-kubernetes.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Kubernetes scheduler that has been added to Spark.
1515
[kubectl](https://kubernetes.io/docs/user-guide/prereqs/). If you do not already have a working Kubernetes cluster,
1616
you may setup a test cluster on your local machine using
1717
[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
18-
* We recommend using the latest releases of minikube be updated to the most recent version with the DNS addon enabled.
18+
* We recommend using the latest release of minikube with the DNS addon enabled.
1919
* You must have appropriate permissions to list, create, edit and delete
2020
[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources
2121
by running `kubectl auth can-i <list|create|edit|delete> pods`.
@@ -28,12 +28,13 @@ by running `kubectl auth can-i <list|create|edit|delete> pods`.
2828
<img src="img/k8s-cluster-mode.png" title="Spark cluster components" alt="Spark cluster components" />
2929
</p>
3030

31-
spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
31+
<code>spark-submit</code> can be directly used to submit a Spark application to a Kubernetes cluster.
32+
The submission mechanism works as follows:
3233

33-
* Spark creates a spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
34+
* Spark creates a Spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
3435
* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.
3536
* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists
36-
logs and remains in "completed" state in the Kubernetes API till it's eventually garbage collected or manually cleaned up.
37+
logs and remains in "completed" state in the Kubernetes API until it's eventually garbage collected or manually cleaned up.
3738

3839
Note that in the completed state, the driver pod does *not* use any computational or memory resources.
3940

@@ -54,7 +55,7 @@ and built for your usage.
5455

5556
You may build these docker images from sources.
5657
There is a script, `sbin/build-push-docker-images.sh` that you can use to build and push
57-
customized spark distribution images consisting of all the above components.
58+
customized Spark distribution images consisting of all the above components.
5859

5960
Example usage is:
6061

@@ -95,14 +96,14 @@ kubectl cluster-info
9596
Kubernetes master is running at http://127.0.0.1:6443
9697
```
9798

98-
In the above example, the specific Kubernetes cluster can be used with spark submit by specifying
99+
In the above example, the specific Kubernetes cluster can be used with <code>spark-submit</code> by specifying
99100
`--master k8s://http://127.0.0.1:6443` as an argument to spark-submit. Additionally, it is also possible to use the
100101
authenticating proxy, `kubectl proxy` to communicate to the Kubernetes API.
101102

102103
The local proxy can be started by:
103104

104105
```bash
105-
kubectl proxy
106+
kubectl proxy
106107
```
107108

108109
If the local proxy is running at localhost:8001, `--master k8s://http://127.0.0.1:8001` can be used as the argument to
@@ -123,15 +124,15 @@ take actions.
123124

124125
### Accessing Logs
125126

126-
Logs can be accessed using the kubernetes API and the `kubectl` CLI. When a Spark application is running, it's possible
127+
Logs can be accessed using the Kubernetes API and the `kubectl` CLI. When a Spark application is running, it's possible
127128
to stream logs from the application using:
128129

129130
```bash
130131
kubectl -n=<namespace> logs -f <driver-pod-name>
131132
```
132133

133134
The same logs can also be accessed through the
134-
[kubernetes dashboard](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/) if installed on
135+
[Kubernetes dashboard](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/) if installed on
135136
the cluster.
136137

137138
### Accessing Driver UI
@@ -143,13 +144,13 @@ The UI associated with any application can be accessed locally using
143144
kubectl port-forward <driver-pod-name> 4040:4040
144145
```
145146

146-
Then, the spark driver UI can be accessed on `http://localhost:4040`.
147+
Then, the Spark driver UI can be accessed on `http://localhost:4040`.
147148

148149
### Debugging
149150

150151
There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the
151152
connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there
152-
are errors during the running of the application, often, the best way to investigate may be through the kubernetes CLI.
153+
are errors during the running of the application, often, the best way to investigate may be through the Kubernetes CLI.
153154

154155
To get some basic information about the scheduling decisions made around the driver pod, you can run:
155156

@@ -165,15 +166,15 @@ kubectl logs <spark-driver-pod>
165166

166167
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
167168
application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
168-
the spark application.
169+
the Spark application.
169170

170171
## Kubernetes Features
171172

172173
### Namespaces
173174

174175
Kubernetes has the concept of [namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
175176
Namespaces are ways to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
176-
use namespaces to launch spark applications. This is through the `--conf spark.kubernetes.namespace` argument to spark-submit.
177+
use namespaces to launch Spark applications. This can be made use of through the `spark.kubernetes.namespace` configuration.
177178

178179
Kubernetes allows using [ResourceQuota](https://kubernetes.io/docs/concepts/policy/resource-quotas/) to set limits on
179180
resources, number of objects, etc on individual namespaces. Namespaces and ResourceQuota can be used in combination by
@@ -198,7 +199,7 @@ that allows driver pods to create pods and services under the default Kubernetes
198199
service account that has the right role granted. Spark on Kubernetes supports specifying a custom service account to
199200
be used by the driver pod through the configuration property
200201
`spark.kubernetes.authenticate.driver.serviceAccountName=<service account name>`. For example to make the driver pod
201-
to use the `spark` service account, a user simply adds the following option to the `spark-submit` command:
202+
use the `spark` service account, a user simply adds the following option to the `spark-submit` command:
202203

203204
```
204205
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
@@ -272,6 +273,7 @@ specific to Spark on Kubernetes.
272273
<td>
273274
Docker image to use for the driver. Specify this using the standard
274275
<a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
276+
This configuration is required and must be provided by the user.
275277
</td>
276278
</tr>
277279
<tr>
@@ -280,6 +282,7 @@ specific to Spark on Kubernetes.
280282
<td>
281283
Docker image to use for the executors. Specify this using the standard
282284
<a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
285+
This configuration is required and must be provided by the user.
283286
</td>
284287
</tr>
285288
<tr>
@@ -365,7 +368,7 @@ specific to Spark on Kubernetes.
365368
<td><code>spark.kubernetes.authenticate.driver.oauthToken</code></td>
366369
<td>(none)</td>
367370
<td>
368-
OAuth token to use when authenticating against the against the Kubernetes API server from the driver pod when
371+
OAuth token to use when authenticating against the Kubernetes API server from the driver pod when
369372
requesting executors. Note that unlike the other authentication options, this must be the exact string value of
370373
the token to use for the authentication. This token value is uploaded to the driver pod. If this is specified, it is
371374
highly recommended to set up TLS for the driver submission server, as this value is sensitive information that would
@@ -483,15 +486,17 @@ specific to Spark on Kubernetes.
483486
<td><code>spark.kubernetes.driver.secrets.[SecretName]</code></td>
484487
<td>(none)</td>
485488
<td>
486-
Mounts the Kubernetes secret named <code>SecretName</code> onto the path specified by the value
489+
Mounts the [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/)
490+
named <code>SecretName</code> onto the path specified by the value
487491
in the driver Pod. The user can specify multiple instances of this for multiple secrets.
488492
</td>
489493
</tr>
490494
<tr>
491495
<td><code>spark.kubernetes.executor.secrets.[SecretName]</code></td>
492496
<td>(none)</td>
493497
<td>
494-
Mounts the Kubernetes secret named <code>SecretName</code> onto the path specified by the value
498+
Mounts the [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/)
499+
named <code>SecretName</code> onto the path specified by the value
495500
in the executor Pods. The user can specify multiple instances of this for multiple secrets.
496501
</td>
497502
</tr>

docs/submitting-applications.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ The master URL passed to Spark can be in one of the following formats:
165165
<code>client</code> or <code>cluster</code> mode depending on the value of <code>--deploy-mode</code>.
166166
The cluster location will be found based on the <code>HADOOP_CONF_DIR</code> or <code>YARN_CONF_DIR</code> variable.
167167
</td></tr>
168-
<tr><td> <code>k8s://HOST:PORT</code> </td><td> Connect to a <a href="running-on-kubernetes.html"> Kubernetes </a> cluster in
168+
<tr><td> <code>k8s://HOST:PORT</code> </td><td> Connect to a <a href="running-on-kubernetes.html">Kubernetes</a> cluster in
169169
<code>cluster</code> mode. Client mode is currently unsupported and will be supported in future releases.
170170
The <code>HOST</code> and <code>PORT</code> refer to the [Kubernetes API Server](https://kubernetes.io/docs/reference/generated/kube-apiserver/).
171171
It connects using TLS by default. In order to force it to use an unsecured connection, you can use

0 commit comments

Comments
 (0)