@@ -8,98 +8,71 @@ title: Spark on Kubernetes Integration Tests
8
8
Note that the integration test framework is currently being heavily revised and
9
9
is subject to change. Note that currently the integration tests only run with Java 8.
10
10
11
- As shorthand to run the tests against any given cluster, you can use the ` e2e/runner.sh ` script.
12
- The script assumes that you have a functioning Kubernetes cluster (1.6+) with kubectl
13
- configured to access it. The master URL of the currently configured cluster on your
14
- machine can be discovered as follows:
15
-
16
- ```
17
- $ kubectl cluster-info
18
-
19
- Kubernetes master is running at https://xyz
20
- ```
21
-
22
- If you want to use a local [ minikube] ( https://github.com/kubernetes/minikube ) cluster,
23
- the minimum tested version is 0.23.0, with the kube-dns addon enabled
24
- and the recommended configuration is 3 CPUs and 4G of memory. There is also a wrapper
25
- script for running on minikube, ` e2e/e2e-minikube.sh ` for testing the master branch
26
- of the apache/spark repository in specific.
27
-
28
- ```
29
- $ minikube start --memory 4000 --cpus 3
30
- ```
31
-
32
- If you're using a non-local cluster, you must provide an image repository
33
- which you have write access to, using the ` -i ` option, in order to store docker images
34
- generated during the test.
35
-
36
- Example usages of the script:
37
-
38
- ```
39
- $ ./e2e/runner.sh -m https://xyz -i docker.io/foxish -d cloud
40
- $ ./e2e/runner.sh -m https://xyz -i test -d minikube
41
- $ ./e2e/runner.sh -m https://xyz -i test -r https://github.com/my-spark/spark -d minikube
42
- $ ./e2e/runner.sh -m https://xyz -i test -r https://github.com/my-spark/spark -b my-branch -d minikube
43
- ```
44
-
45
- # Detailed Documentation
46
-
47
- ## Running the tests using maven
48
-
49
- Integration tests firstly require installing [ Minikube] ( https://kubernetes.io/docs/getting-started-guides/minikube/ ) on
50
- your machine, and for the ` Minikube ` binary to be on your ` PATH ` .. Refer to the Minikube documentation for instructions
51
- on how to install it. It is recommended to allocate at least 8 CPUs and 8GB of memory to the Minikube cluster.
52
-
53
- Running the integration tests requires a Spark distribution package tarball that
54
- contains Spark jars, submission clients, etc. You can download a tarball from
55
- http://spark.apache.org/downloads.html . Or, you can create a distribution from
56
- source code using ` make-distribution.sh ` . For example:
57
-
58
- ```
59
- $ git clone [email protected] /apache/spark.git
60
- $ cd spark
61
- $ ./dev/make-distribution.sh --tgz \
62
- -Phadoop-2.7 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver
63
- ```
64
-
65
- The above command will create a tarball like spark-2.3.0-SNAPSHOT-bin.tgz in the
66
- top-level dir. For more details, see the related section in
67
- [ building-spark.md] ( https://github.com/apache/spark/blob/master/docs/building-spark.md#building-a-runnable-distribution )
68
-
69
-
70
- Once you prepare the tarball, the integration tests can be executed with Maven or
71
- your IDE. Note that when running tests from an IDE, the ` pre-integration-test `
72
- phase must be run every time the Spark main code changes. When running tests
73
- from the command line, the ` pre-integration-test ` phase should automatically be
74
- invoked if the ` integration-test ` phase is run.
75
-
76
- With Maven, the integration test can be run using the following command:
77
-
78
- ```
79
- $ mvn clean integration-test \
80
- -Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz
81
- ```
82
-
83
- ## Running against an arbitrary cluster
84
-
85
- In order to run against any cluster, use the following:
86
- ``` sh
87
- $ mvn clean integration-test \
88
- -Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz \
89
- -DextraScalaTestArgs=" -Dspark.kubernetes.test.master=k8s://https://<master>
90
-
91
- ## Reuse the previous Docker images
92
-
93
- The integration tests build a number of Docker images, which takes some time.
94
- By default, the images are built every time the tests run. You may want to skip
95
- re-building those images during development, if the distribution package did not
96
- change since the last run. You can pass the property
97
- ` spark.kubernetes.test.imageDockerTag` to the test process and specify the Docker
98
- image tag that is appropriate.
99
- Here is an example:
100
-
101
- ` ` `
102
- $ mvn clean integration-test \
103
- -Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz \
104
- -Dspark.kubernetes.test.imageDockerTag=latest
105
- ` ` `
11
+ The simplest way to run the integration tests is to install and run Minikube, then run the following:
12
+
13
+ build/mvn integration-test
14
+
15
+ The minimum tested version of Minikube is 0.23.0. The kube-dns addon must be enabled. Minikube should
16
+ run with a minimum of 3 CPUs and 4G of memory:
17
+
18
+ minikube start --cpus 3 --memory 4G
19
+
20
+ You can download Minikube [ here] ( https://github.com/kubernetes/minikube/releases ) .
21
+
22
+ # Integration test customization
23
+
24
+ Configuration of the integration test runtime is done through passing different Java system properties to the Maven
25
+ command. The main useful options are outlined below.
26
+
27
+ ## Use a non-local cluster
28
+
29
+ To use your own cluster running in the cloud, set the following:
30
+
31
+ * ` spark.kubernetes.test.deployMode ` to ` cloud ` to indicate that Minikube will not be used.
32
+ * ` spark.kubernetes.test.master ` to your cluster's externally accessible URL
33
+ * ` spark.kubernetes.test.imageRepo ` to a write-accessible Docker image repository that provides the images for your
34
+ cluster. The framework assumes your local Docker client can push to this repository.
35
+
36
+ Therefore the command looks like this:
37
+
38
+ build/mvn integration-test \
39
+ -Dspark.kubernetes.test.deployMode=cloud \
40
+ -Dspark.kubernetes.test.master=https://example.com:8443/apiserver \
41
+ -Dspark.kubernetes.test.repo=docker.example.com/spark-images
42
+
43
+ ## Re-using Docker Images
44
+
45
+ By default, the test framework will build new Docker images on every test execution. A unique image tag is generated,
46
+ and it is written to file at ` target/imageTag.txt ` . To reuse the images built in a previous run, set:
47
+
48
+ * ` spark.kubernetes.test.imageTag ` to the tag specified in ` target/imageTag.txt `
49
+ * ` spark.kubernetes.test.skipBuildingImages ` to ` true `
50
+
51
+ Therefore the command looks like this:
52
+
53
+ build/mvn integration-test \
54
+ -Dspark.kubernetes.test.imageTag=$(cat target/imageTag.txt) \
55
+ -Dspark.kubernetes.test.skipBuildingImages=true
56
+
57
+ ## Customizing the Spark Source Code to Test
58
+
59
+ By default, the test framework will test the master branch of Spark from [ here] ( https://github.com/apache/spark ) . You
60
+ can specify the following options to test against different source versions of Spark:
61
+
62
+ * ` spark.kubernetes.test.sparkRepo ` to the git or http URI of the Spark git repository to clone
63
+ * ` spark.kubernetes.test.sparkBranch ` to the branch of the repository to build.
64
+
65
+ An example:
66
+
67
+ build/mvn integration-test \
68
+ -Dspark.kubernetes.test.sparkRepo=https://github.com/apache-spark-on-k8s/spark \
69
+ -Dspark.kubernetes.test.sparkBranch=new-feature
70
+
71
+ Additionally, you can use a pre-built Spark distribution. In this case, the repository is not cloned at all, and no
72
+ source code has to be compiled.
73
+
74
+ * ` spark.kubernetes.test.sparkTgz ` can be set to a tarball containing the Spark distribution to test.
75
+
76
+ When the tests are cloning a repository and building it, the Spark distribution is placed in
77
+ ` target/spark/spark-<VERSION>.tgz ` . Reuse this tarball to save a significant amount of time if you are iterating on
78
+ the development of these integration tests.
0 commit comments