Skip to content

Add documentation for Zeppelin with Spark on Kubernetes #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions src/jekyll/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ This project was put up for voting in [an SPIP](http://apache-spark-developers-l
in August 2017 and passed. It is in the process of being
upstreamed into the apache/spark repository.


### Contents

* [Running Spark on Kubernetes](./running-on-kubernetes.html)
* [Running Spark in Cloud Environments](./running-on-kubernetes-cloud.html)
* [Contribute](./contribute.html)
* [Running Zeppelin with Spark on Kubernetes](./zeppelin.html)
* [Contribute](./contribute.html)
67 changes: 67 additions & 0 deletions src/jekyll/zeppelin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
layout: global
displayTitle: Apache Zeppelin running with Spark on Kubernetes
title: Apache Zeppelin running with Spark on Kubernetes
description: User Documentation for Apache Zeppelin running with Spark on Kubernetes
---

This page a ongoing effort to describe how to run Apache Zeppelin with Spark on Kubernetes

> At the time being, the needed code is not integrated in the `master` branches of `apache-zeppelin` nor the `apache-spark-on-k8s/spark` repositories.
> You are welcome to already ty it out and send any feedback and question.

Firs things firs, you have to choose the following modes in which you will run Zeppelin with Spark on Kubernetes:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


+ The `Kubernetes modes`: Can be `in-cluster` (within a Pod) or `out-cluster` (from outside the Kubernetes cluster).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the proper terminology in k8s world? is "out-cluster" the right term?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question and from the already used/seen in-cluster, I have deduced 'out-cluster`. Happy to change to any other more official terminology.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ The `Spark deployment modes`: Can be `client` or `cluster`.

Only three combinations of these options are supported:

1. `in-cluster` with `spark-client` mode.
2. `in-cluster` with `spark-cluster` mode.
3. `out-cluster` with `spark-cluster` mode.

For now, to be able to test these combinations, you need to build specific branches (see hereafter) or to use third-party Helm charts or Docker images. The needed branches and related PR are listed here:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the meeting today, we want to ensure that these branches merge before we can publish documentation.

cc @felixcheung @erikerlandson @liyinan926 @mccheah


1. Spark-k8s driven branch: In-cluster client mode [see pull request #456](https://github.com/apache-spark-on-k8s/spark/pull/456)
2. Apache Zeppeoin driven branch: Add support to run Spark interpreter on a Kubernetes cluster [see pull request #2637](https://github.com/apache/zeppelin/pull/2637)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeppelin?
what is driven branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to point where this branch resides... I have remove that to avoid confusion.


## In-Cluster with Spark-Client

![In-Cluster with Spark-Client](/img/zeppelin_in-cluster_spark-client.png "In-Cluster with Spark-Client")

Build a new Zepplin based on [#456 In-cluster client mode](https://github.com/apache-spark-on-k8s/spark/pull/456).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeppelin, extra space

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings:

+ `spark.app.name`: Any name you like
+ `spark.master`: k8s://https://kubernetes:443
+ `spark.submit.deployMode`: client
+ `spark.kubernetes.driver.pod.name`: The name of the pod where your Zeppelin instance is running.
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`...

## In-Cluster with Spark-Cluster

![In-Cluster with Spark-Cluster](/img/zeppelin_in-cluster_spark-cluster.png "In-Cluster with Spark-Cluster")

Build a new Zepplin based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeppelin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one doesn't seem to be updated...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done now.


Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings:

+ `spark.app.name`: The name pf the application must begin with `zri-`
+ `spark.master`: k8s://https://kubernetes:443
+ `spark.submit.deployMode`: cluster
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`...

## Out-Cluster with Spark-Cluster

![Out-Cluster with Spark-Client](/img/zeppelin_out-cluster_spark-cluster.png "Out-Cluster with Spark-Client")

Build a new Spark and their associated docker images based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637).

Once done, any vanilla Apache Zeppelin deployed in a Kubernetes Pod (your can use a Helm chart for this) will work out-of-the box with the following interpreter settings:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this helm chart for this (use a different image for a newer Zeppelin though)
https://github.com/kubernetes/charts/blob/master/stable/spark/templates/spark-zeppelin-deployment.yaml

shall we link it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a section at the end "how to test" and linked to the chart.


+ `spark.app.name`: The name pf the application must begin with `zri-`
+ `spark.master`: k8s://https://ip-address-of-the-kube-api:6443 (port may depend on your setup)
+ `spark.submit.deployMode`: cluster
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`...