Skip to content

Commit 7ab165b

Browse files
foxishrxin
authored andcommitted
[SPARK-22648][K8S] Spark on Kubernetes - Documentation
What changes were proposed in this pull request? This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by #19717 and #19468 which have merged already. How was this patch tested? The script has been in use for releases on our fork. Rest is documentation. cc rxin mateiz (shepherd) k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko reviewers: vanzin felixcheung jiangxb1987 mridulm TODO: - [x] Add dockerfiles directory to built distribution. (#20007) - [x] Change references to docker to instead say "container" (#19995) - [x] Update configuration table. - [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (#20032) Author: foxish <[email protected]> Closes #19946 from foxish/update-k8s-docs.
1 parent 7beb375 commit 7ab165b

10 files changed

+677
-8
lines changed

docs/_layouts/global.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@
9999
<li><a href="spark-standalone.html">Spark Standalone</a></li>
100100
<li><a href="running-on-mesos.html">Mesos</a></li>
101101
<li><a href="running-on-yarn.html">YARN</a></li>
102+
<li><a href="running-on-kubernetes.html">Kubernetes</a></li>
102103
</ul>
103104
</li>
104105

docs/building-spark.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by the
4949
to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured
5050
with Maven profile settings and so on like the direct Maven build. Example:
5151

52-
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
52+
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
5353

5454
This will build Spark distribution along with Python pip and R packages. For more information on usage, run `./dev/make-distribution.sh --help`
5555

@@ -90,6 +90,10 @@ like ZooKeeper and Hadoop itself.
9090
## Building with Mesos support
9191

9292
./build/mvn -Pmesos -DskipTests clean package
93+
94+
## Building with Kubernetes support
95+
96+
./build/mvn -Pkubernetes -DskipTests clean package
9397

9498
## Building with Kafka 0.8 support
9599

docs/cluster-overview.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,8 @@ The system currently supports three cluster managers:
5252
* [Apache Mesos](running-on-mesos.html) -- a general cluster manager that can also run Hadoop MapReduce
5353
and service applications.
5454
* [Hadoop YARN](running-on-yarn.html) -- the resource manager in Hadoop 2.
55-
* [Kubernetes (experimental)](https://github.com/apache-spark-on-k8s/spark) -- In addition to the above,
56-
there is experimental support for Kubernetes. Kubernetes is an open-source platform
57-
for providing container-centric infrastructure. Kubernetes support is being actively
58-
developed in an [apache-spark-on-k8s](https://github.com/apache-spark-on-k8s/) Github organization.
59-
For documentation, refer to that project's README.
55+
* [Kubernetes](running-on-kubernetes.html) -- [Kubernetes](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/)
56+
is an open-source platform that provides container-centric infrastructure.
6057

6158
A third-party project (not supported by the Spark project) exists to add support for
6259
[Nomad](https://github.com/hashicorp/nomad-spark) as a cluster manager.

docs/configuration.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2376,6 +2376,8 @@ can be found on the pages for each mode:
23762376

23772377
#### [Mesos](running-on-mesos.html#configuration)
23782378

2379+
#### [Kubernetes](running-on-kubernetes.html#configuration)
2380+
23792381
#### [Standalone Mode](spark-standalone.html#cluster-launch-scripts)
23802382

23812383
# Environment Variables

docs/img/k8s-cluster-mode.png

54.2 KB
Loading

docs/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ options for deployment:
8181
* [Standalone Deploy Mode](spark-standalone.html): simplest way to deploy Spark on a private cluster
8282
* [Apache Mesos](running-on-mesos.html)
8383
* [Hadoop YARN](running-on-yarn.html)
84+
* [Kubernetes](running-on-kubernetes.html)
8485

8586
# Where to Go from Here
8687

@@ -112,7 +113,7 @@ options for deployment:
112113
* [Mesos](running-on-mesos.html): deploy a private cluster using
113114
[Apache Mesos](http://mesos.apache.org)
114115
* [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN)
115-
* [Kubernetes (experimental)](https://github.com/apache-spark-on-k8s/spark): deploy Spark on top of Kubernetes
116+
* [Kubernetes](running-on-kubernetes.html): deploy Spark on top of Kubernetes
116117

117118
**Other Documents:**
118119

docs/running-on-kubernetes.md

Lines changed: 578 additions & 0 deletions
Large diffs are not rendered by default.

docs/running-on-yarn.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ Spark application's configuration (driver, executors, and the AM when running in
1818

1919
There are two deploy modes that can be used to launch Spark applications on YARN. In `cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
2020

21-
Unlike [Spark standalone](spark-standalone.html) and [Mesos](running-on-mesos.html) modes, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn`.
21+
Unlike other cluster managers supported by Spark in which the master's address is specified in the `--master`
22+
parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration.
23+
Thus, the `--master` parameter is `yarn`.
2224

2325
To launch a Spark application in `cluster` mode:
2426

docs/submitting-applications.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,16 @@ export HADOOP_CONF_DIR=XXX
127127
http://path/to/examples.jar \
128128
1000
129129

130+
# Run on a Kubernetes cluster in cluster deploy mode
131+
./bin/spark-submit \
132+
--class org.apache.spark.examples.SparkPi \
133+
--master k8s://xx.yy.zz.ww:443 \
134+
--deploy-mode cluster \
135+
--executor-memory 20G \
136+
--num-executors 50 \
137+
http://path/to/examples.jar \
138+
1000
139+
130140
{% endhighlight %}
131141

132142
# Master URLs
@@ -155,6 +165,12 @@ The master URL passed to Spark can be in one of the following formats:
155165
<code>client</code> or <code>cluster</code> mode depending on the value of <code>--deploy-mode</code>.
156166
The cluster location will be found based on the <code>HADOOP_CONF_DIR</code> or <code>YARN_CONF_DIR</code> variable.
157167
</td></tr>
168+
<tr><td> <code>k8s://HOST:PORT</code> </td><td> Connect to a <a href="running-on-kubernetes.html">Kubernetes</a> cluster in
169+
<code>cluster</code> mode. Client mode is currently unsupported and will be supported in future releases.
170+
The <code>HOST</code> and <code>PORT</code> refer to the [Kubernetes API Server](https://kubernetes.io/docs/reference/generated/kube-apiserver/).
171+
It connects using TLS by default. In order to force it to use an unsecured connection, you can use
172+
<code>k8s://http://HOST:PORT</code>.
173+
</td></tr>
158174
</table>
159175

160176

sbin/build-push-docker-images.sh

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/usr/bin/env bash
2+
3+
# Licensed to the Apache Software Foundation (ASF) under one or more
4+
# contributor license agreements. See the NOTICE file distributed with
5+
# this work for additional information regarding copyright ownership.
6+
# The ASF licenses this file to You under the Apache License, Version 2.0
7+
# (the "License"); you may not use this file except in compliance with
8+
# the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
#
18+
19+
# This script builds and pushes docker images when run from a release of Spark
20+
# with Kubernetes support.
21+
22+
declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \
23+
[spark-executor]=kubernetes/dockerfiles/executor/Dockerfile )
24+
25+
function build {
26+
docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .
27+
for image in "${!path[@]}"; do
28+
docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
29+
done
30+
}
31+
32+
33+
function push {
34+
for image in "${!path[@]}"; do
35+
docker push ${REPO}/$image:${TAG}
36+
done
37+
}
38+
39+
function usage {
40+
echo "This script must be run from a runnable distribution of Apache Spark."
41+
echo "Usage: ./sbin/build-push-docker-images.sh -r <repo> -t <tag> build"
42+
echo " ./sbin/build-push-docker-images.sh -r <repo> -t <tag> push"
43+
echo "for example: ./sbin/build-push-docker-images.sh -r docker.io/myrepo -t v2.3.0 push"
44+
}
45+
46+
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
47+
usage
48+
exit 0
49+
fi
50+
51+
while getopts r:t: option
52+
do
53+
case "${option}"
54+
in
55+
r) REPO=${OPTARG};;
56+
t) TAG=${OPTARG};;
57+
esac
58+
done
59+
60+
if [ -z "$REPO" ] || [ -z "$TAG" ]; then
61+
usage
62+
else
63+
case "${@: -1}" in
64+
build) build;;
65+
push) push;;
66+
*) usage;;
67+
esac
68+
fi

0 commit comments

Comments
 (0)