Skip to content

Commit 0f31d8a

Browse files
authored
Merge pull request apache-spark-on-k8s#252 from palantir/resync-kube
[NOSQUASH] Resync kube
2 parents 0d4903d + 2b82d5a commit 0f31d8a

File tree

9 files changed

+70
-43
lines changed

9 files changed

+70
-43
lines changed

conf/kubernetes-resource-staging-server.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ spec:
3232
name: spark-resource-staging-server-config
3333
containers:
3434
- name: spark-resource-staging-server
35-
image: kubespark/spark-resource-staging-server:v2.1.0-kubernetes-0.2.0
35+
image: kubespark/spark-resource-staging-server:v2.2.0-kubernetes-0.3.0
3636
resources:
3737
requests:
3838
cpu: 100m

conf/kubernetes-shuffle-service.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,14 @@ kind: DaemonSet
2020
metadata:
2121
labels:
2222
app: spark-shuffle-service
23-
spark-version: 2.1.0
23+
spark-version: 2.2.0
2424
name: shuffle
2525
spec:
2626
template:
2727
metadata:
2828
labels:
2929
app: spark-shuffle-service
30-
spark-version: 2.1.0
30+
spark-version: 2.2.0
3131
spec:
3232
volumes:
3333
- name: temp-volume
@@ -38,7 +38,7 @@ spec:
3838
# This is an official image that is built
3939
# from the dockerfiles/shuffle directory
4040
# in the spark distribution.
41-
image: kubespark/spark-shuffle:v2.1.0-kubernetes-0.2.0
41+
image: kubespark/spark-shuffle:v2.2.0-kubernetes-0.3.0
4242
imagePullPolicy: IfNotPresent
4343
volumeMounts:
4444
- mountPath: '/tmp'
@@ -51,4 +51,4 @@ spec:
5151
requests:
5252
cpu: "1"
5353
limits:
54-
cpu: "1"
54+
cpu: "1"

docs/running-on-kubernetes.md

Lines changed: 32 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,10 @@ cluster, you may setup a test cluster on your local machine using
1717
* You must have appropriate permissions to create and list [pods](https://kubernetes.io/docs/user-guide/pods/),
1818
[ConfigMaps](https://kubernetes.io/docs/tasks/configure-pod-container/configmap/) and
1919
[secrets](https://kubernetes.io/docs/concepts/configuration/secret/) in your cluster. You can verify that
20-
you can list these resources by running `kubectl get pods` `kubectl get configmap`, and `kubectl get secrets` which
20+
you can list these resources by running `kubectl get pods`, `kubectl get configmap`, and `kubectl get secrets` which
2121
should give you a list of pods and configmaps (if any) respectively.
22+
* The service account or credentials used by the driver pods must have appropriate permissions
23+
as well for editing pod spec.
2224
* You must have a spark distribution with Kubernetes support. This may be obtained from the
2325
[release tarball](https://github.com/apache-spark-on-k8s/spark/releases) or by
2426
[building Spark with Kubernetes support](../resource-managers/kubernetes/README.md#building-spark-with-kubernetes-support).
@@ -36,15 +38,15 @@ If you wish to use pre-built docker images, you may use the images published in
3638
<tr><th>Component</th><th>Image</th></tr>
3739
<tr>
3840
<td>Spark Driver Image</td>
39-
<td><code>kubespark/spark-driver:v2.1.0-kubernetes-0.2.0</code></td>
41+
<td><code>kubespark/spark-driver:v2.2.0-kubernetes-0.3.0</code></td>
4042
</tr>
4143
<tr>
4244
<td>Spark Executor Image</td>
43-
<td><code>kubespark/spark-executor:v2.1.0-kubernetes-0.2.0</code></td>
45+
<td><code>kubespark/spark-executor:v2.2.0-kubernetes-0.3.0</code></td>
4446
</tr>
4547
<tr>
4648
<td>Spark Initialization Image</td>
47-
<td><code>kubespark/spark-init:v2.1.0-kubernetes-0.2.0</code></td>
49+
<td><code>kubespark/spark-init:v2.2.0-kubernetes-0.3.0</code></td>
4850
</tr>
4951
</table>
5052

@@ -80,9 +82,9 @@ are set up as described above:
8082
--kubernetes-namespace default \
8183
--conf spark.executor.instances=5 \
8284
--conf spark.app.name=spark-pi \
83-
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 \
84-
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 \
85-
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 \
85+
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.3.0 \
86+
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.3.0 \
87+
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.3.0 \
8688
local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar
8789

8890
The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting
@@ -107,6 +109,18 @@ Finally, notice that in the above example we specify a jar with a specific URI w
107109
the location of the example jar that is already in the Docker image. Using dependencies that are on your machine's local
108110
disk is discussed below.
109111

112+
When Kubernetes [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/) is enabled,
113+
the `default` service account used by the driver may not have appropriate pod `edit` permissions
114+
for launching executor pods. We recommend to add another service account, say `spark`, with
115+
the necessary privilege. For example:
116+
117+
kubectl create serviceaccount spark
118+
kubectl create clusterrolebinding spark-edit --clusterrole edit \
119+
--serviceaccount default:spark --namespace default
120+
121+
With this, one can add `--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark` to
122+
the spark-submit command line above to specify the service account to use.
123+
110124
## Dependency Management
111125

112126
Application dependencies that are being submitted from your machine need to be sent to a **resource staging server**
@@ -129,9 +143,9 @@ and then you can compute the value of Pi as follows:
129143
--kubernetes-namespace default \
130144
--conf spark.executor.instances=5 \
131145
--conf spark.app.name=spark-pi \
132-
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 \
133-
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 \
134-
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 \
146+
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.3.0 \
147+
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.3.0 \
148+
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.3.0 \
135149
--conf spark.kubernetes.resourceStagingServer.uri=http://<address-of-any-cluster-node>:31000 \
136150
examples/jars/spark_examples_2.11-2.2.0.jar
137151

@@ -172,9 +186,9 @@ If our local proxy were listening on port 8001, we would have our submission loo
172186
--kubernetes-namespace default \
173187
--conf spark.executor.instances=5 \
174188
--conf spark.app.name=spark-pi \
175-
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 \
176-
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 \
177-
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 \
189+
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.3.0 \
190+
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.3.0 \
191+
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.3.0 \
178192
local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar
179193

180194
Communication between Spark and Kubernetes clusters is performed using the fabric8 kubernetes-client library.
@@ -222,7 +236,7 @@ service because there may be multiple shuffle service instances running in a clu
222236
a way to target a particular shuffle service.
223237

224238
For example, if the shuffle service we want to use is in the default namespace, and
225-
has pods with labels `app=spark-shuffle-service` and `spark-version=2.1.0`, we can
239+
has pods with labels `app=spark-shuffle-service` and `spark-version=2.2.0`, we can
226240
use those tags to target that particular shuffle service at job launch time. In order to run a job with dynamic allocation enabled,
227241
the command may then look like the following:
228242

@@ -237,7 +251,7 @@ the command may then look like the following:
237251
--conf spark.dynamicAllocation.enabled=true \
238252
--conf spark.shuffle.service.enabled=true \
239253
--conf spark.kubernetes.shuffle.namespace=default \
240-
--conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1.0" \
254+
--conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.2.0" \
241255
local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar 10 400000 2
242256

243257
## Advanced
@@ -314,9 +328,9 @@ communicate with the resource staging server over TLS. The trustStore can be set
314328
--kubernetes-namespace default \
315329
--conf spark.executor.instances=5 \
316330
--conf spark.app.name=spark-pi \
317-
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 \
318-
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 \
319-
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 \
331+
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.3.0 \
332+
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.3.0 \
333+
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.3.0 \
320334
--conf spark.kubernetes.resourceStagingServer.uri=https://<address-of-any-cluster-node>:31000 \
321335
--conf spark.ssl.kubernetes.resourceStagingServer.enabled=true \
322336
--conf spark.ssl.kubernetes.resourceStagingServer.clientCertPem=/home/myuser/cert.pem \

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/constants.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,5 +101,5 @@ package object constants {
101101
private[spark] val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver"
102102
private[spark] val KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc"
103103
private[spark] val MEMORY_OVERHEAD_FACTOR = 0.10
104-
private[spark] val MEMORY_OVERHEAD_MIN = 384L
104+
private[spark] val MEMORY_OVERHEAD_MIN_MIB = 384L
105105
}

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/BaseDriverConfigurationStep.scala

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -46,13 +46,13 @@ private[spark] class BaseDriverConfigurationStep(
4646
private val driverLimitCores = submissionSparkConf.get(KUBERNETES_DRIVER_LIMIT_CORES)
4747

4848
// Memory settings
49-
private val driverMemoryMb = submissionSparkConf.get(
49+
private val driverMemoryMiB = submissionSparkConf.get(
5050
org.apache.spark.internal.config.DRIVER_MEMORY)
51-
private val memoryOverheadMb = submissionSparkConf
51+
private val memoryOverheadMiB = submissionSparkConf
5252
.get(KUBERNETES_DRIVER_MEMORY_OVERHEAD)
53-
.getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * driverMemoryMb).toInt,
54-
MEMORY_OVERHEAD_MIN))
55-
private val driverContainerMemoryWithOverhead = driverMemoryMb + memoryOverheadMb
53+
.getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * driverMemoryMiB).toInt,
54+
MEMORY_OVERHEAD_MIN_MIB))
55+
private val driverContainerMemoryWithOverheadMiB = driverMemoryMiB + memoryOverheadMiB
5656
private val driverDockerImage = submissionSparkConf.get(DRIVER_DOCKER_IMAGE)
5757

5858
override def configureDriver(
@@ -86,10 +86,10 @@ private[spark] class BaseDriverConfigurationStep(
8686
.withAmount(driverCpuCores)
8787
.build()
8888
val driverMemoryQuantity = new QuantityBuilder(false)
89-
.withAmount(s"${driverMemoryMb}M")
89+
.withAmount(s"${driverMemoryMiB}Mi")
9090
.build()
9191
val driverMemoryLimitQuantity = new QuantityBuilder(false)
92-
.withAmount(s"${driverContainerMemoryWithOverhead}M")
92+
.withAmount(s"${driverContainerMemoryWithOverheadMiB}Mi")
9393
.build()
9494
val maybeCpuLimitQuantity = driverLimitCores.map { limitCores =>
9595
("cpu", new QuantityBuilder(false).withAmount(limitCores).build())
@@ -102,7 +102,7 @@ private[spark] class BaseDriverConfigurationStep(
102102
.addToEnv(driverExtraClasspathEnv.toSeq: _*)
103103
.addNewEnv()
104104
.withName(ENV_DRIVER_MEMORY)
105-
.withValue(driverContainerMemoryWithOverhead + "m")
105+
.withValue(driverContainerMemoryWithOverheadMiB + "M") // JVM treats the "M" unit as "Mi"
106106
.endEnv()
107107
.addNewEnv()
108108
.withName(ENV_DRIVER_MAIN_CLASS)

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/DriverKubernetesCredentialsStep.scala

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ private[spark] class DriverKubernetesCredentialsStep(
4444
s"$APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CLIENT_CERT_FILE_CONF_SUFFIX")
4545
private val maybeMountedCaCertFile = submissionSparkConf.getOption(
4646
s"$APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CA_CERT_FILE_CONF_SUFFIX")
47+
private val driverServiceAccount = submissionSparkConf.get(KUBERNETES_SERVICE_ACCOUNT_NAME)
4748

4849
override def configureDriver(driverSpec: KubernetesDriverSpec): KubernetesDriverSpec = {
4950
val driverSparkConf = driverSpec.driverSparkConf.clone()
@@ -81,7 +82,16 @@ private[spark] class DriverKubernetesCredentialsStep(
8182
.endVolume()
8283
.endSpec()
8384
.build()
84-
}.getOrElse(driverSpec.driverPod)
85+
}.getOrElse(
86+
driverServiceAccount.map { account =>
87+
new PodBuilder(driverSpec.driverPod)
88+
.editOrNewSpec()
89+
.withServiceAccount(account)
90+
.withServiceAccountName(account)
91+
.endSpec()
92+
.build()
93+
}.getOrElse(driverSpec.driverPod)
94+
)
8595
val driverContainerWithMountedSecretVolume = kubernetesCredentialsSecret.map { secret =>
8696
new ContainerBuilder(driverSpec.driverContainer)
8797
.addNewVolumeMount()

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -114,16 +114,16 @@ private[spark] class KubernetesClusterSchedulerBackend(
114114
throw new SparkException("Must specify the driver pod name"))
115115
private val executorPodNamePrefix = conf.get(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)
116116

117-
private val executorMemoryMb = conf.get(org.apache.spark.internal.config.EXECUTOR_MEMORY)
117+
private val executorMemoryMiB = conf.get(org.apache.spark.internal.config.EXECUTOR_MEMORY)
118118
private val executorMemoryString = conf.get(
119119
org.apache.spark.internal.config.EXECUTOR_MEMORY.key,
120120
org.apache.spark.internal.config.EXECUTOR_MEMORY.defaultValueString)
121121

122-
private val memoryOverheadMb = conf
122+
private val memoryOverheadMiB = conf
123123
.get(KUBERNETES_EXECUTOR_MEMORY_OVERHEAD)
124-
.getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMb).toInt,
125-
MEMORY_OVERHEAD_MIN))
126-
private val executorMemoryWithOverhead = executorMemoryMb + memoryOverheadMb
124+
.getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt,
125+
MEMORY_OVERHEAD_MIN_MIB))
126+
private val executorMemoryWithOverheadMiB = executorMemoryMiB + memoryOverheadMiB
127127

128128
private val executorCores = conf.getDouble("spark.executor.cores", 1d)
129129
private val executorLimitCores = conf.getOption(KUBERNETES_EXECUTOR_LIMIT_CORES.key)
@@ -443,10 +443,10 @@ private[spark] class KubernetesClusterSchedulerBackend(
443443
SPARK_ROLE_LABEL -> SPARK_POD_EXECUTOR_ROLE) ++
444444
executorLabels
445445
val executorMemoryQuantity = new QuantityBuilder(false)
446-
.withAmount(s"${executorMemoryMb}M")
446+
.withAmount(s"${executorMemoryMiB}Mi")
447447
.build()
448448
val executorMemoryLimitQuantity = new QuantityBuilder(false)
449-
.withAmount(s"${executorMemoryWithOverhead}M")
449+
.withAmount(s"${executorMemoryWithOverheadMiB}Mi")
450450
.build()
451451
val executorCpuQuantity = new QuantityBuilder(false)
452452
.withAmount(executorCores.toString)

resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/BaseDriverConfigurationStepSuite.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,17 +81,17 @@ private[spark] class BaseDriverConfigurationStepSuite extends SparkFunSuite {
8181
.toMap
8282
assert(envs.size === 6)
8383
assert(envs(ENV_SUBMIT_EXTRA_CLASSPATH) === "/opt/spark/spark-exmaples.jar")
84-
assert(envs(ENV_DRIVER_MEMORY) === "456m")
84+
assert(envs(ENV_DRIVER_MEMORY) === "456M")
8585
assert(envs(ENV_DRIVER_MAIN_CLASS) === MAIN_CLASS)
8686
assert(envs(ENV_DRIVER_ARGS) === "arg1 arg2")
8787
assert(envs(DRIVER_CUSTOM_ENV_KEY1) === "customDriverEnv1")
8888
assert(envs(DRIVER_CUSTOM_ENV_KEY2) === "customDriverEnv2")
8989
val resourceRequirements = preparedDriverSpec.driverContainer.getResources
9090
val requests = resourceRequirements.getRequests.asScala
9191
assert(requests("cpu").getAmount === "2")
92-
assert(requests("memory").getAmount === "256M")
92+
assert(requests("memory").getAmount === "256Mi")
9393
val limits = resourceRequirements.getLimits.asScala
94-
assert(limits("memory").getAmount === "456M")
94+
assert(limits("memory").getAmount === "456Mi")
9595
assert(limits("cpu").getAmount === "4")
9696
val driverPodMetadata = preparedDriverSpec.driverPod.getMetadata
9797
assert(driverPodMetadata.getName === "spark-driver-pod")

resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/spark-base/entrypoint.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@
1616
# limitations under the License.
1717
#
1818

19+
# echo commands to the terminal output
20+
set -x
21+
1922
# Check whether there is a passwd entry for the container UID
2023
myuid=$(id -u)
2124
mygid=$(id -g)

0 commit comments

Comments
 (0)