Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit a51dcc8

Browse files
mccheahash211
authored andcommitted
Allow customizing external URI provision + External URI can be set via annotations (#147)
* Listen for annotations that provide external URIs. * FIx scalstyle * Address comments * Fix doc style * Docs updates * Clearly explain path rewrites
1 parent 7132f5d commit a51dcc8

File tree

11 files changed

+598
-92
lines changed

11 files changed

+598
-92
lines changed

docs/running-on-kubernetes.md

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,36 @@ The above mechanism using `kubectl proxy` can be used when we have authenticatio
106106
kubernetes-client library does not support. Authentication using X509 Client Certs and oauth tokens
107107
is currently supported.
108108

109+
### Determining the Driver Base URI
110+
111+
Kubernetes pods run with their own IP address space. If Spark is run in cluster mode, the driver pod may not be
112+
accessible to the submitter. However, the submitter needs to send local dependencies from its local disk to the driver
113+
pod.
114+
115+
By default, Spark will place a [Service](https://kubernetes.io/docs/user-guide/services/#type-nodeport) with a NodePort
116+
that is opened on every node. The submission client will then contact the driver at one of the node's
117+
addresses with the appropriate service port.
118+
119+
There may be cases where the nodes cannot be reached by the submission client. For example, the cluster may
120+
only be reachable through an external load balancer. The user may provide their own external URI for Spark driver
121+
services. To use a your own external URI instead of a node's IP and node port, first set
122+
`spark.kubernetes.driver.serviceManagerType` to `ExternalAnnotation`. A service will be created with the annotation
123+
`spark-job.alpha.apache.org/provideExternalUri`, and this service routes to the driver pod. You will need to run a
124+
separate process that watches the API server for services that are created with this annotation in the application's
125+
namespace (set by `spark.kubernetes.namespace`). The process should determine a URI that routes to this service
126+
(potentially configuring infrastructure to handle the URI behind the scenes), and patch the service to include an
127+
annotation `spark-job.alpha.apache.org/resolvedExternalUri`, which has its value as the external URI that your process
128+
has provided (e.g. `https://example.com:8080/my-job`).
129+
130+
Note that the URI provided in the annotation needs to route traffic to the appropriate destination on the pod, which has
131+
a empty path portion of the URI. This means the external URI provider will likely need to rewrite the path from the
132+
external URI to the destination on the pod, e.g. https://example.com:8080/spark-app-1/submit will need to route traffic
133+
to https://<pod_ip>:<service_port>/. Note that the paths of these two URLs are different.
134+
135+
If the above is confusing, keep in mind that this functionality is only necessary if the submitter cannot reach any of
136+
the nodes at the driver's node port. It is recommended to use the default configuration with the node port service
137+
whenever possible.
138+
109139
### Spark Properties
110140

111141
Below are some other common properties that are specific to Kubernetes. Most of the other configurations are the same
@@ -207,7 +237,7 @@ from the other deployment modes. See the [configuration page](configuration.html
207237
<td><code>false</code></td>
208238
<td>
209239
Whether to expose the driver Web UI port as a service NodePort. Turned off by default because NodePort is a limited
210-
resource. Use alternatives such as Ingress if possible.
240+
resource.
211241
</td>
212242
</tr>
213243
<tr>
@@ -225,6 +255,21 @@ from the other deployment modes. See the [configuration page](configuration.html
225255
Interval between reports of the current Spark job status in cluster mode.
226256
</td>
227257
</tr>
258+
<tr>
259+
<td><code>spark.kubernetes.driver.serviceManagerType</code></td>
260+
<td><code>NodePort</code></td>
261+
<td>
262+
A tag indicating which class to use for creating the Kubernetes service and determining its URI for the submission
263+
client. Valid values are currently <code>NodePort</code> and <code>ExternalAnnotation</code>. By default, a service
264+
is created with the <code>NodePort</code> type, and the driver will be contacted at one of the nodes at the port
265+
that the nodes expose for the service. If the nodes cannot be contacted from the submitter's machine, consider
266+
setting this to <code>ExternalAnnotation</code> as described in "Determining the Driver Base URI" above. One may
267+
also include a custom implementation of <code>org.apache.spark.deploy.rest.kubernetes.DriverServiceManager</code> on
268+
the submitter's classpath - spark-submit service loads an instance of that class. To use the custom
269+
implementation, set this value to the custom implementation's return value of
270+
<code>DriverServiceManager#getServiceManagerType()</code>. This method should only be done as a last resort.
271+
</td>
272+
</tr>
228273
</table>
229274

230275
## Current Limitations
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
org.apache.spark.deploy.rest.kubernetes.ExternalSuppliedUrisDriverServiceManager
2+
org.apache.spark.deploy.rest.kubernetes.NodePortUrisDriverServiceManager

0 commit comments

Comments
 (0)