-
Notifications
You must be signed in to change notification settings - Fork 6
Add documentation for Zeppelin with Spark on Kubernetes #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 3 commits
da41245
32804a5
30926f1
6b1369d
ab72eb8
7cc3432
833f6a9
1e4eeab
0b661b0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
layout: global | ||
displayTitle: Apache Zeppelin running with Spark on Kubernetes | ||
title: Apache Zeppelin running with Spark on Kubernetes | ||
description: User Documentation for Apache Zeppelin running with Spark on Kubernetes | ||
--- | ||
|
||
This page a ongoing effort to describe how to run Apache Zeppelin with Spark on Kubernetes | ||
|
||
> At the time being, the needed code is not integrated in the `master` branches of `apache-zeppelin` nor the `apache-spark-on-k8s/spark` repositories. | ||
> You are welcome to already ty it out and send any feedback and question. | ||
|
||
Firs things firs, you have to choose the following modes in which you will run Zeppelin with Spark on Kubernetes: | ||
|
||
+ The `Kubernetes modes`: Can be `in-cluster` (within a Pod) or `out-cluster` (from outside the Kubernetes cluster). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what are the proper terminology in k8s world? is "out-cluster" the right term? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had the same question and from the already used/seen There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
+ The `Spark deployment modes`: Can be `client` or `cluster`. | ||
|
||
Only three combinations of these options are supported: | ||
|
||
1. `in-cluster` with `spark-client` mode. | ||
2. `in-cluster` with `spark-cluster` mode. | ||
3. `out-cluster` with `spark-cluster` mode. | ||
|
||
For now, to be able to test these combinations, you need to build specific branches (see hereafter) or to use third-party Helm charts or Docker images. The needed branches and related PR are listed here: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed in the meeting today, we want to ensure that these branches merge before we can publish documentation. |
||
|
||
1. Spark-k8s driven branch: In-cluster client mode [see pull request #456](https://github.com/apache-spark-on-k8s/spark/pull/456) | ||
2. Apache Zeppeoin driven branch: Add support to run Spark interpreter on a Kubernetes cluster [see pull request #2637](https://github.com/apache/zeppelin/pull/2637) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Zeppelin? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just wanted to point where this branch resides... I have remove that to avoid confusion. |
||
|
||
## In-Cluster with Spark-Client | ||
|
||
 | ||
|
||
Build a new Zepplin based on [#456 In-cluster client mode](https://github.com/apache-spark-on-k8s/spark/pull/456). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Zeppelin, extra space There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
|
||
Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings: | ||
|
||
+ `spark.app.name`: Any name you like | ||
+ `spark.master`: k8s://https://kubernetes:443 | ||
+ `spark.submit.deployMode`: client | ||
+ `spark.kubernetes.driver.pod.name`: The name of the pod where your Zeppelin instance is running. | ||
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`... | ||
|
||
## In-Cluster with Spark-Cluster | ||
|
||
 | ||
|
||
Build a new Zepplin based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Zeppelin There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this one doesn't seem to be updated...? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done now. |
||
|
||
Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings: | ||
|
||
+ `spark.app.name`: The name pf the application must begin with `zri-` | ||
+ `spark.master`: k8s://https://kubernetes:443 | ||
+ `spark.submit.deployMode`: cluster | ||
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`... | ||
|
||
## Out-Cluster with Spark-Cluster | ||
|
||
 | ||
|
||
Build a new Spark and their associated docker images based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637). | ||
|
||
Once done, any vanilla Apache Zeppelin deployed in a Kubernetes Pod (your can use a Helm chart for this) will work out-of-the box with the following interpreter settings: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this helm chart for this (use a different image for a newer Zeppelin though) shall we link it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added a section at the end "how to test" and linked to the chart. |
||
|
||
+ `spark.app.name`: The name pf the application must begin with `zri-` | ||
+ `spark.master`: k8s://https://ip-address-of-the-kube-api:6443 (port may depend on your setup) | ||
+ `spark.submit.deployMode`: cluster | ||
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed