Skip to content

Commit 73478bf

Browse files
author
Amogh Shetkar
committed
* Changes from PR apache-spark-on-k8s#456
* Author @sahilprasad * Enables spark applications to be submitted in 'in-cluster client' mode.
1 parent 7b8c9f5 commit 73478bf

File tree

2 files changed

+54
-14
lines changed

2 files changed

+54
-14
lines changed

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,7 @@ object SparkSubmit extends CommandLineUtils {
325325
// Require all python files to be local, so we can add them to the PYTHONPATH
326326
// In YARN cluster mode, python files are distributed as regular files, which can be non-local.
327327
// In Mesos cluster mode, non-local python files are automatically downloaded by Mesos.
328-
if (args.isPython && !isYarnCluster && !isMesosCluster) {
328+
if (args.isPython && !isYarnCluster && !isMesosCluster && !isKubernetesCluster) {
329329
if (Utils.nonLocalPaths(args.primaryResource).nonEmpty) {
330330
printErrorAndExit(s"Only local python files are supported: ${args.primaryResource}")
331331
}
@@ -336,16 +336,16 @@ object SparkSubmit extends CommandLineUtils {
336336
}
337337

338338
// Require all R files to be local
339-
if (args.isR && !isYarnCluster && !isMesosCluster) {
339+
if (args.isR && !isYarnCluster && !isMesosCluster && !isKubernetesCluster) {
340340
if (Utils.nonLocalPaths(args.primaryResource).nonEmpty) {
341341
printErrorAndExit(s"Only local R files are supported: ${args.primaryResource}")
342342
}
343343
}
344344

345345
// The following modes are not supported or applicable
346346
(clusterManager, deployMode) match {
347-
case (KUBERNETES, CLIENT) =>
348-
printErrorAndExit("Client mode is currently not supported for Kubernetes.")
347+
case (KUBERNETES, CLIENT) if !inK8sCluster() =>
348+
printErrorAndExit("Kubernetes currently only supports in-cluster client mode.")
349349
case (STANDALONE, CLUSTER) if args.isPython =>
350350
printErrorAndExit("Cluster deploy mode is currently not supported for python " +
351351
"applications on standalone clusters.")
@@ -682,10 +682,10 @@ object SparkSubmit extends CommandLineUtils {
682682
// explicitly sets `spark.submit.pyFiles` in his/her default properties file.
683683
sysProps.get("spark.submit.pyFiles").foreach { pyFiles =>
684684
val resolvedPyFiles = Utils.resolveURIs(pyFiles)
685-
val formattedPyFiles = if (!isYarnCluster && !isMesosCluster) {
685+
val formattedPyFiles = if (!isYarnCluster && !isMesosCluster && !isKubernetesCluster) {
686686
PythonRunner.formatPaths(resolvedPyFiles).mkString(",")
687687
} else {
688-
// Ignoring formatting python path in yarn and mesos cluster mode, these two modes
688+
// Ignoring formatting python path in yarn, mesos and kubernetes cluster mode, these modes
689689
// support dealing with remote python files, they could distribute and add python files
690690
// locally.
691691
resolvedPyFiles
@@ -857,6 +857,14 @@ object SparkSubmit extends CommandLineUtils {
857857
res == SparkLauncher.NO_RESOURCE
858858
}
859859

860+
/**
861+
* Return whether the submission environment is within a Kubernetes cluster
862+
*/
863+
private[deploy] def inK8sCluster(): Boolean = {
864+
!sys.env.get("KUBERNETES_SERVICE_HOST").isEmpty &&
865+
!sys.env.get("KUBERNETES_SERVICE_PORT").isEmpty
866+
}
867+
860868
/**
861869
* Merge a sequence of comma-separated file lists, some of which may be null to indicate
862870
* no files, into a single comma-separated string.

docs/running-on-kubernetes.md

Lines changed: 40 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ For example, if the registry host is `registry-host` and the registry is listeni
6969
docker push registry-host:5000/spark-driver:latest
7070
docker push registry-host:5000/spark-executor:latest
7171
docker push registry-host:5000/spark-init:latest
72-
72+
7373
Note that `spark-base` is the base image for the other images. It must be built first before the other images, and then afterwards the other images can be built in any order.
7474

7575
## Submitting Applications to Kubernetes
@@ -198,10 +198,10 @@ is currently supported.
198198

199199
### Running PySpark
200200

201-
Running PySpark on Kubernetes leverages the same spark-submit logic when launching on Yarn and Mesos.
202-
Python files can be distributed by including, in the conf, `--py-files`
201+
Running PySpark on Kubernetes leverages the same spark-submit logic when launching on Yarn and Mesos.
202+
Python files can be distributed by including, in the conf, `--py-files`
203203

204-
Below is an example submission:
204+
Below is an example submission:
205205

206206

207207
```
@@ -265,6 +265,37 @@ other cluster managers.
265265

266266
## Advanced
267267

268+
### Running in-cluster client mode applications
269+
270+
While Spark on Kubernetes does not support client mode applications, such as the PySpark shell, when launched from outside Kubernetes, Spark on Kubernetes does support client mode applications launched from within the cluster. This _in-cluster_ client mode bypasses some of the networking and dependency issues inherent to running a client from outside of a cluster while allowing much of the same functionality in terms of interactive use cases, such as the PySpark shell and Jupyter notebooks.
271+
272+
In order to run in client mode, use `kubectl attach` to attach to an existing driver pod on the cluster, or the following to run a new driver:
273+
274+
kubectl run -it --image=<driver image> --restart=Never -- /bin/bash
275+
276+
This will open up a shell into the specified driver pod from which you can run client mode applications. In order to appropriately configure
277+
these in-cluster applications, be sure to set the following configuration value for all applications, as in the following `spark-submit` example,
278+
which tells the cluster manager to refer back to the current driver pod as the driver for any applications you submit:
279+
280+
spark.kubernetes.driver.pod.name=$HOSTNAME
281+
282+
With that set, you should be able to run the following example from within the pod:
283+
284+
bin/spark-submit \
285+
--deploy-mode client \
286+
--class org.apache.spark.examples.SparkPi \
287+
--master k8s://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT \
288+
--kubernetes-namespace default \
289+
--conf spark.app.name=spark-pi \
290+
--conf spark.kubernetes.driver.pod.name=$HOSTNAME \
291+
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:latest \
292+
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:latest \
293+
--conf spark.dynamicAllocation.enabled=true \
294+
--conf spark.shuffle.service.enabled=true \
295+
--conf spark.kubernetes.shuffle.namespace=default \
296+
--conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1.0" \
297+
local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar 10
298+
268299
### Securing the Resource Staging Server with TLS
269300

270301
The default configuration of the resource staging server is not secured with TLS. It is highly recommended to configure
@@ -742,12 +773,12 @@ from the other deployment modes. See the [configuration page](configuration.html
742773
</td>
743774
</tr>
744775
<tr>
745-
<td><code>spark.kubernetes.node.selector.[labelKey]</code></td>
776+
<td><code>spark.kubernetes.node.selector.[labelKey]</code></td>
746777
<td>(none)</td>
747778
<td>
748-
Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the
779+
Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the
749780
configuration's value. For example, setting <code>spark.kubernetes.node.selector.identifier</code> to <code>myIdentifier</code>
750-
will result in the driver pod and executors having a node selector with key <code>identifier</code> and value
781+
will result in the driver pod and executors having a node selector with key <code>identifier</code> and value
751782
<code>myIdentifier</code>. Multiple node selector keys can be added by setting multiple configurations with this prefix.
752783
</td>
753784
</tr>
@@ -808,14 +839,15 @@ from the other deployment modes. See the [configuration page](configuration.html
808839
We have a default value of <code>spark.kubernetes.kerberos.tokensecret.itemkey</code> should you not include it. But
809840
you should always include this if you are proposing a pre-existing secret contain the delegation token data.
810841
<td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
842+
<td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
811843
<td>(none)</td>
812844
<td>
813845
Add the environment variable specified by <code>EnvironmentVariableName</code> to
814846
the Executor process. The user can specify multiple of these to set multiple environment variables.
815847
</td>
816848
</tr>
817849
<tr>
818-
<td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
850+
<td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
819851
<td>(none)</td>
820852
<td>
821853
Add the environment variable specified by <code>EnvironmentVariableName</code> to

0 commit comments

Comments
 (0)