Skip to content

Commit fc82222

Browse files
rvessemccheah
authored andcommitted
[SPARK-25809][K8S][TEST] New K8S integration testing backends
## What changes were proposed in this pull request? Currently K8S integration tests are hardcoded to use a `minikube` based backend. `minikube` is VM based so can be resource hungry and also doesn't cope well with certain networking setups (for example using Cisco AnyConnect software VPN `minikube` is unusable as it detects its own IP incorrectly). This PR Adds a new K8S integration testing backend that allows for using the Kubernetes support in [Docker for Desktop](https://blog.docker.com/2018/07/kubernetes-is-now-available-in-docker-desktop-stable-channel/). It also generalises the framework to be able to run the integration tests against an arbitrary Kubernetes cluster. To Do: - [x] General Kubernetes cluster backend - [x] Documentation on Kubernetes integration testing - [x] Testing of general K8S backend - [x] Check whether change from timestamps being `Time` to `String` in Fabric 8 upgrade needs additional fix up ## How was this patch tested? Ran integration tests with Docker for Desktop and all passed: ![screen shot 2018-10-23 at 14 19 56](https://user-images.githubusercontent.com/2104864/47363460-c5816a00-d6ce-11e8-9c15-56b34698e797.png) Suggested Reviewers: ifilonenko srowen Author: Rob Vesse <[email protected]> Closes apache#22805 from rvesse/SPARK-25809.
1 parent cd92f25 commit fc82222

File tree

14 files changed

+356
-48
lines changed

14 files changed

+356
-48
lines changed

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ private[spark] object SparkKubernetesClientFactory {
4242
sparkConf: SparkConf,
4343
defaultServiceAccountToken: Option[File],
4444
defaultServiceAccountCaCert: Option[File]): KubernetesClient = {
45+
46+
// TODO [SPARK-25887] Support configurable context
47+
4548
val oauthTokenFileConf = s"$kubernetesAuthConfPrefix.$OAUTH_TOKEN_FILE_CONF_SUFFIX"
4649
val oauthTokenConf = s"$kubernetesAuthConfPrefix.$OAUTH_TOKEN_CONF_SUFFIX"
4750
val oauthTokenFile = sparkConf.getOption(oauthTokenFileConf)
@@ -63,6 +66,8 @@ private[spark] object SparkKubernetesClientFactory {
6366
.getOption(s"$kubernetesAuthConfPrefix.$CLIENT_CERT_FILE_CONF_SUFFIX")
6467
val dispatcher = new Dispatcher(
6568
ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
69+
70+
// TODO [SPARK-25887] Create builder in a way that respects configurable context
6671
val config = new ConfigBuilder()
6772
.withApiVersion("v1")
6873
.withMasterUrl(master)

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,10 @@ package org.apache.spark.deploy.k8s.submit
1818

1919
import java.util.concurrent.{CountDownLatch, TimeUnit}
2020

21-
import scala.collection.JavaConverters._
22-
2321
import io.fabric8.kubernetes.api.model.Pod
2422
import io.fabric8.kubernetes.client.{KubernetesClientException, Watcher}
2523
import io.fabric8.kubernetes.client.Watcher.Action
2624

27-
import org.apache.spark.SparkException
2825
import org.apache.spark.deploy.k8s.KubernetesUtils._
2926
import org.apache.spark.internal.Logging
3027
import org.apache.spark.util.ThreadUtils

resource-managers/kubernetes/integration-tests/README.md

Lines changed: 170 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,59 @@ title: Spark on Kubernetes Integration Tests
88
Note that the integration test framework is currently being heavily revised and
99
is subject to change. Note that currently the integration tests only run with Java 8.
1010

11-
The simplest way to run the integration tests is to install and run Minikube, then run the following:
11+
The simplest way to run the integration tests is to install and run Minikube, then run the following from this
12+
directory:
1213

1314
dev/dev-run-integration-tests.sh
1415

1516
The minimum tested version of Minikube is 0.23.0. The kube-dns addon must be enabled. Minikube should
16-
run with a minimum of 3 CPUs and 4G of memory:
17+
run with a minimum of 4 CPUs and 6G of memory:
1718

18-
minikube start --cpus 3 --memory 4096
19+
minikube start --cpus 4 --memory 6144
1920

2021
You can download Minikube [here](https://github.com/kubernetes/minikube/releases).
2122

2223
# Integration test customization
2324

24-
Configuration of the integration test runtime is done through passing different arguments to the test script. The main useful options are outlined below.
25+
Configuration of the integration test runtime is done through passing different arguments to the test script.
26+
The main useful options are outlined below.
27+
28+
## Using a different backend
29+
30+
The integration test backend i.e. the K8S cluster used for testing is controlled by the `--deploy-mode` option. By
31+
default this is set to `minikube`, the available backends are their perequisites are as follows.
32+
33+
### `minikube`
34+
35+
Uses the local `minikube` cluster, this requires that `minikube` 0.23.0 or greater be installed and that it be allocated
36+
at least 4 CPUs and 6GB memory (some users have reported success with as few as 3 CPUs and 4GB memory). The tests will
37+
check if `minikube` is started and abort early if it isn't currently running.
38+
39+
### `docker-for-desktop`
40+
41+
Since July 2018 Docker for Desktop provide an optional Kubernetes cluster that can be enabled as described in this
42+
[blog post](https://blog.docker.com/2018/07/kubernetes-is-now-available-in-docker-desktop-stable-channel/). Assuming
43+
this is enabled using this backend will auto-configure itself from the `docker-for-desktop` context that Docker creates
44+
in your `~/.kube/config` file. If your config file is in a different location you should set the `KUBECONFIG`
45+
environment variable appropriately.
46+
47+
### `cloud`
48+
49+
These cloud backend configures the tests to use an arbitrary Kubernetes cluster running in the cloud or otherwise.
50+
51+
The `cloud` backend auto-configures the cluster to use from your K8S config file, this is assumed to be `~/.kube/config`
52+
unless the `KUBECONFIG` environment variable is set to override this location. By default this will use whatever your
53+
current context is in the config file, to use an alternative context from your config file you can specify the
54+
`--context <context>` flag with the desired context.
55+
56+
You can optionally use a different K8S master URL than the one your K8S config file specified, this should be supplied
57+
via the `--spark-master <master-url>` flag.
2558

2659
## Re-using Docker Images
2760

2861
By default, the test framework will build new Docker images on every test execution. A unique image tag is generated,
29-
and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, or to use a Docker image tag
30-
that you have built by other means already, pass the tag to the test script:
62+
and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, or to use a Docker
63+
image tag that you have built by other means already, pass the tag to the test script:
3164

3265
dev/dev-run-integration-tests.sh --image-tag <tag>
3366

@@ -37,16 +70,140 @@ where if you still want to use images that were built before by the test framewo
3770

3871
## Spark Distribution Under Test
3972

40-
The Spark code to test is handed to the integration test system via a tarball. Here is the option that is used to specify the tarball:
73+
The Spark code to test is handed to the integration test system via a tarball. Here is the option that is used to
74+
specify the tarball:
4175

4276
* `--spark-tgz <path-to-tgz>` - set `<path-to-tgz>` to point to a tarball containing the Spark distribution to test.
4377

44-
TODO: Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current tree.
78+
This Tarball should be created by first running `dev/make-distribution.sh` passing the `--tgz` flag and `-Pkubernetes`
79+
as one of the options to ensure that Kubernetes support is included in the distribution. For more details on building a
80+
runnable distribution please see the
81+
[Building Spark](https://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution)
82+
documentation.
83+
84+
**TODO:** Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current
85+
tree.
4586

4687
## Customizing the Namespace and Service Account
4788

48-
* `--namespace <namespace>` - set `<namespace>` to the namespace in which the tests should be run.
49-
* `--service-account <service account name>` - set `<service account name>` to the name of the Kubernetes service account to
50-
use in the namespace specified by the `--namespace`. The service account is expected to have permissions to get, list, watch,
51-
and create pods. For clusters with RBAC turned on, it's important that the right permissions are granted to the service account
52-
in the namespace through an appropriate role and role binding. A reference RBAC configuration is provided in `dev/spark-rbac.yaml`.
89+
If no namespace is specified then a temporary namespace will be created and deleted during the test run. Similarly if
90+
no service account is specified then the `default` service account for the namespace will be used.
91+
92+
Using the `--namespace <namespace>` flag sets `<namespace>` to the namespace in which the tests should be run. If this
93+
is supplied then the tests assume this namespace exists in the K8S cluster and will not attempt to create it.
94+
Additionally this namespace must have an appropriately authorized service account which can be customised via the
95+
`--service-account` flag.
96+
97+
The `--service-account <service account name>` flag sets `<service account name>` to the name of the Kubernetes service
98+
account to use in the namespace specified by the `--namespace` flag. The service account is expected to have permissions
99+
to get, list, watch, and create pods. For clusters with RBAC turned on, it's important that the right permissions are
100+
granted to the service account in the namespace through an appropriate role and role binding. A reference RBAC
101+
configuration is provided in `dev/spark-rbac.yaml`.
102+
103+
# Running the Test Directly
104+
105+
If you prefer to run just the integration tests directly, then you can customise the behaviour via passing system
106+
properties to Maven. For example:
107+
108+
mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.11 \
109+
-Pkubernetes -Pkubernetes-integration-tests \
110+
-Phadoop-2.7 -Dhadoop.version=2.7.3 \
111+
-Dspark.kubernetes.test.sparkTgz=spark-3.0.0-SNAPSHOT-bin-example.tgz \
112+
-Dspark.kubernetes.test.imageTag=sometag \
113+
-Dspark.kubernetes.test.imageRepo=docker.io/somerepo \
114+
-Dspark.kubernetes.test.namespace=spark-int-tests \
115+
-Dspark.kubernetes.test.deployMode=docker-for-desktop \
116+
-Dtest.include.tags=k8s
117+
118+
119+
## Available Maven Properties
120+
121+
The following are the available Maven properties that can be passed. For the most part these correspond to flags passed
122+
to the wrapper scripts and using the wrapper scripts will simply set these appropriately behind the scenes.
123+
124+
<table>
125+
<tr>
126+
<th>Property</th>
127+
<th>Description</th>
128+
<th>Default</th>
129+
</tr>
130+
<tr>
131+
<td><code>spark.kubernetes.test.sparkTgz</code></td>
132+
<td>
133+
A runnable Spark distribution to test.
134+
</td>
135+
<td></td>
136+
</tr>
137+
<tr>
138+
<td><code>spark.kubernetes.test.unpackSparkDir</code></td>
139+
<td>
140+
The directory where the runnable Spark distribution will be unpacked.
141+
</td>
142+
<td><code>${project.build.directory}/spark-dist-unpacked</code></td>
143+
</tr>
144+
<tr>
145+
<td><code>spark.kubernetes.test.deployMode</code></td>
146+
<td>
147+
The integration test backend to use. Acceptable values are <code>minikube</code>,
148+
<code>docker-for-desktop</code> and <code>cloud</code>.
149+
<td><code>minikube</code></td>
150+
</tr>
151+
<tr>
152+
<td><code>spark.kubernetes.test.kubeConfigContext</code></td>
153+
<td>
154+
When using the <code>cloud</code> backend specifies the context from the users K8S config file that should be used
155+
as the target cluster for integration testing. If not set and using the <code>cloud</code> backend then your
156+
current context will be used.
157+
</td>
158+
<td></td>
159+
</tr>
160+
<tr>
161+
<td><code>spark.kubernetes.test.master</code></td>
162+
<td>
163+
When using the <code>cloud-url</code> backend must be specified to indicate the K8S master URL to communicate
164+
with.
165+
</td>
166+
<td></td>
167+
</tr>
168+
<tr>
169+
<td><code>spark.kubernetes.test.imageTag</code></td>
170+
<td>
171+
A specific image tag to use, when set assumes images with those tags are already built and available in the
172+
specified image repository. When set to <code>N/A</code> (the default) fresh images will be built.
173+
</td>
174+
<td><code>N/A</code>
175+
</tr>
176+
<tr>
177+
<td><code>spark.kubernetes.test.imageTagFile</code></td>
178+
<td>
179+
A file containing the image tag to use, if no specific image tag is set then fresh images will be built with a
180+
generated tag and that tag written to this file.
181+
</td>
182+
<td><code>${project.build.directory}/imageTag.txt</code></td>
183+
</tr>
184+
<tr>
185+
<td><code>spark.kubernetes.test.imageRepo</code></td>
186+
<td>
187+
The Docker image repository that contains the images to be used if a specific image tag is set or to which the
188+
images will be pushed to if fresh images are being built.
189+
</td>
190+
<td><code>docker.io/kubespark</code></td>
191+
</tr>
192+
<tr>
193+
<td><code>spark.kubernetes.test.namespace</code></td>
194+
<td>
195+
A specific Kubernetes namespace to run the tests in. If specified then the tests assume that this namespace
196+
already exists. When not specified a temporary namespace for the tests will be created and deleted as part of the
197+
test run.
198+
</td>
199+
<td></td>
200+
</tr>
201+
<tr>
202+
<td><code>spark.kubernetes.test.serviceAccountName</code></td>
203+
<td>
204+
A specific Kubernetes service account to use for running the tests. If not specified then the namespaces default
205+
service account will be used and that must have sufficient permissions or the tests will fail.
206+
</td>
207+
<td></td>
208+
</tr>
209+
</table>

resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ IMAGE_TAG="N/A"
2626
SPARK_MASTER=
2727
NAMESPACE=
2828
SERVICE_ACCOUNT=
29+
CONTEXT=
2930
INCLUDE_TAGS="k8s"
3031
EXCLUDE_TAGS=
3132
SCALA_VERSION="$($TEST_ROOT_DIR/build/mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate -Dexpression=scala.binary.version | grep -v '\[' )"
@@ -61,6 +62,10 @@ while (( "$#" )); do
6162
SERVICE_ACCOUNT="$2"
6263
shift
6364
;;
65+
--context)
66+
CONTEXT="$2"
67+
shift
68+
;;
6469
--include-tags)
6570
INCLUDE_TAGS="k8s,$2"
6671
shift
@@ -94,6 +99,11 @@ then
9499
properties=( ${properties[@]} -Dspark.kubernetes.test.serviceAccountName=$SERVICE_ACCOUNT )
95100
fi
96101

102+
if [ -n $CONTEXT ];
103+
then
104+
properties=( ${properties[@]} -Dspark.kubernetes.test.kubeConfigContext=$CONTEXT )
105+
fi
106+
97107
if [ -n $SPARK_MASTER ];
98108
then
99109
properties=( ${properties[@]} -Dspark.kubernetes.test.master=$SPARK_MASTER )

resource-managers/kubernetes/integration-tests/pom.xml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,20 @@
3333
<scala-maven-plugin.version>3.2.2</scala-maven-plugin.version>
3434
<scalatest-maven-plugin.version>1.0</scalatest-maven-plugin.version>
3535
<sbt.project.name>kubernetes-integration-tests</sbt.project.name>
36+
37+
<!-- Integration Test Configuration Properties -->
38+
<!-- Please see README.md in this directory for explanation of these -->
39+
<spark.kubernetes.test.sparkTgz></spark.kubernetes.test.sparkTgz>
3640
<spark.kubernetes.test.unpackSparkDir>${project.build.directory}/spark-dist-unpacked</spark.kubernetes.test.unpackSparkDir>
3741
<spark.kubernetes.test.imageTag>N/A</spark.kubernetes.test.imageTag>
3842
<spark.kubernetes.test.imageTagFile>${project.build.directory}/imageTag.txt</spark.kubernetes.test.imageTagFile>
3943
<spark.kubernetes.test.deployMode>minikube</spark.kubernetes.test.deployMode>
4044
<spark.kubernetes.test.imageRepo>docker.io/kubespark</spark.kubernetes.test.imageRepo>
45+
<spark.kubernetes.test.kubeConfigContext></spark.kubernetes.test.kubeConfigContext>
46+
<spark.kubernetes.test.master></spark.kubernetes.test.master>
47+
<spark.kubernetes.test.namespace></spark.kubernetes.test.namespace>
48+
<spark.kubernetes.test.serviceAccountName></spark.kubernetes.test.serviceAccountName>
49+
4150
<test.exclude.tags></test.exclude.tags>
4251
<test.include.tags></test.include.tags>
4352
</properties>
@@ -135,6 +144,7 @@
135144
<spark.kubernetes.test.unpackSparkDir>${spark.kubernetes.test.unpackSparkDir}</spark.kubernetes.test.unpackSparkDir>
136145
<spark.kubernetes.test.imageRepo>${spark.kubernetes.test.imageRepo}</spark.kubernetes.test.imageRepo>
137146
<spark.kubernetes.test.deployMode>${spark.kubernetes.test.deployMode}</spark.kubernetes.test.deployMode>
147+
<spark.kubernetes.test.kubeConfigContext>${spark.kubernetes.test.kubeConfigContext}</spark.kubernetes.test.kubeConfigContext>
138148
<spark.kubernetes.test.master>${spark.kubernetes.test.master}</spark.kubernetes.test.master>
139149
<spark.kubernetes.test.namespace>${spark.kubernetes.test.namespace}</spark.kubernetes.test.namespace>
140150
<spark.kubernetes.test.serviceAccountName>${spark.kubernetes.test.serviceAccountName}</spark.kubernetes.test.serviceAccountName>

resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -71,19 +71,36 @@ if [[ $IMAGE_TAG == "N/A" ]];
7171
then
7272
IMAGE_TAG=$(uuidgen);
7373
cd $UNPACKED_SPARK_TGZ
74-
if [[ $DEPLOY_MODE == cloud ]] ;
75-
then
76-
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG build
77-
if [[ $IMAGE_REPO == gcr.io* ]] ;
78-
then
79-
gcloud docker -- push $IMAGE_REPO/spark:$IMAGE_TAG
80-
else
81-
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG push
82-
fi
83-
else
84-
# -m option for minikube.
85-
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -m -r $IMAGE_REPO -t $IMAGE_TAG build
86-
fi
74+
75+
case $DEPLOY_MODE in
76+
cloud)
77+
# Build images
78+
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG build
79+
80+
# Push images appropriately
81+
if [[ $IMAGE_REPO == gcr.io* ]] ;
82+
then
83+
gcloud docker -- push $IMAGE_REPO/spark:$IMAGE_TAG
84+
else
85+
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG push
86+
fi
87+
;;
88+
89+
docker-for-desktop)
90+
# Only need to build as this will place it in our local Docker repo which is all
91+
# we need for Docker for Desktop to work so no need to also push
92+
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG build
93+
;;
94+
95+
minikube)
96+
# Only need to build and if we do this with the -m option for minikube we will
97+
# build the images directly using the minikube Docker daemon so no need to push
98+
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -m -r $IMAGE_REPO -t $IMAGE_TAG build
99+
;;
100+
*)
101+
echo "Unrecognized deploy mode $DEPLOY_MODE" && exit 1
102+
;;
103+
esac
87104
cd -
88105
fi
89106

resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ import scala.collection.JavaConverters._
3333

3434
import org.apache.spark.SparkFunSuite
3535
import org.apache.spark.deploy.k8s.integrationtest.TestConfig._
36+
import org.apache.spark.deploy.k8s.integrationtest.TestConstants._
3637
import org.apache.spark.deploy.k8s.integrationtest.backend.{IntegrationTestBackend, IntegrationTestBackendFactory}
3738
import org.apache.spark.internal.Logging
3839

@@ -77,7 +78,7 @@ private[spark] class KubernetesSuite extends SparkFunSuite
7778
System.clearProperty(key)
7879
}
7980

80-
val sparkDirProp = System.getProperty("spark.kubernetes.test.unpackSparkDir")
81+
val sparkDirProp = System.getProperty(CONFIG_KEY_UNPACK_DIR)
8182
require(sparkDirProp != null, "Spark home directory must be provided in system properties.")
8283
sparkHomeDir = Paths.get(sparkDirProp)
8384
require(sparkHomeDir.toFile.isDirectory,

resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,16 @@ import scala.collection.mutable
2525
import io.fabric8.kubernetes.client.DefaultKubernetesClient
2626
import org.scalatest.concurrent.Eventually
2727

28+
import org.apache.spark.deploy.k8s.integrationtest.TestConstants._
2829
import org.apache.spark.internal.Logging
2930

3031
private[spark] class KubernetesTestComponents(defaultClient: DefaultKubernetesClient) {
3132

32-
val namespaceOption = Option(System.getProperty("spark.kubernetes.test.namespace"))
33+
val namespaceOption = Option(System.getProperty(CONFIG_KEY_KUBE_NAMESPACE))
3334
val hasUserSpecifiedNamespace = namespaceOption.isDefined
3435
val namespace = namespaceOption.getOrElse(UUID.randomUUID().toString.replaceAll("-", ""))
3536
val serviceAccountName =
36-
Option(System.getProperty("spark.kubernetes.test.serviceAccountName"))
37+
Option(System.getProperty(CONFIG_KEY_KUBE_SVC_ACCOUNT))
3738
.getOrElse("default")
3839
val kubernetesClient = defaultClient.inNamespace(namespace)
3940
val clientConfig = kubernetesClient.getConfiguration

0 commit comments

Comments
 (0)