@@ -5,7 +5,7 @@ title: Running Spark on Kubernetes
5
5
* This will become a table of contents (this text will be scraped).
6
6
{: toc }
7
7
8
- Spark can run on clusters managed by [ Kubernetes] ( https://kubernetes.io ) . This features makes use of the new experimental native
8
+ Spark can run on clusters managed by [ Kubernetes] ( https://kubernetes.io ) . This feature makes use of the new experimental native
9
9
Kubernetes scheduler that has been added to Spark.
10
10
11
11
# Prerequisites
@@ -31,7 +31,7 @@ by running `kubectl auth can-i <list|create|edit|delete> pods`.
31
31
spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
32
32
33
33
* Spark creates a spark driver running within a [ Kubernetes pod] ( https://kubernetes.io/docs/concepts/workloads/pods/pod/ ) .
34
- * The driver creates executors which are also Kubernetes pods and connects to them, and executes application code.
34
+ * The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.
35
35
* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists
36
36
logs and remains in "completed" state in the Kubernetes API till it's eventually garbage collected or manually cleaned up.
37
37
@@ -68,16 +68,18 @@ building using the supplied script, or manually.
68
68
69
69
To launch Spark Pi in cluster mode,
70
70
71
- bin/spark-submit \
72
- --deploy-mode cluster \
73
- --class org.apache.spark.examples.SparkPi \
74
- --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
75
- --conf spark.kubernetes.namespace=default \
76
- --conf spark.executor.instances=5 \
77
- --conf spark.app.name=spark-pi \
78
- --conf spark.kubernetes.driver.docker.image=<driver-image> \
79
- --conf spark.kubernetes.executor.docker.image=<executor-image> \
80
- local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
71
+ {% highlight bash %}
72
+ $ bin/spark-submit \
73
+ --deploy-mode cluster \
74
+ --class org.apache.spark.examples.SparkPi \
75
+ --master k8s://https://<k8s-apiserver-host >:<k8s-apiserver-port > \
76
+ --conf spark.kubernetes.namespace=default \
77
+ --conf spark.executor.instances=5 \
78
+ --conf spark.app.name=spark-pi \
79
+ --conf spark.kubernetes.driver.docker.image=<driver-image > \
80
+ --conf spark.kubernetes.executor.docker.image=<executor-image > \
81
+ local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
82
+ {% endhighlight %}
81
83
82
84
The Spark master, specified either via passing the ` --master ` command line argument to ` spark-submit ` or by setting
83
85
` spark.master ` in the application's configuration, must be a URL with the format ` k8s://<api_server_url> ` . Prefixing the
@@ -170,7 +172,7 @@ the spark application.
170
172
### Namespaces
171
173
172
174
Kubernetes has the concept of [ namespaces] ( https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ ) .
173
- Namespaces are a way to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
175
+ Namespaces are ways to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
174
176
use namespaces to launch spark applications. This is through the ` --conf spark.kubernetes.namespace ` argument to spark-submit.
175
177
176
178
Kubernetes allows using [ ResourceQuota] ( https://kubernetes.io/docs/concepts/policy/resource-quotas/ ) to set limits on
@@ -250,7 +252,7 @@ and provide feedback to the development team.
250
252
251
253
# Configuration
252
254
253
- See the [ configuration page] ( configuration.html ) for information on Spark configurations. The following configuration is
255
+ See the [ configuration page] ( configuration.html ) for information on Spark configurations. The following configurations are
254
256
specific to Spark on Kubernetes.
255
257
256
258
#### Spark Properties
0 commit comments