- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10
feat: Jupyterhub with keycloak, spark and s3 #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
        
      
    
  
     Merged
                    Changes from 34 commits
      Commits
    
    
            Show all changes
          
          
            39 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      8f41ef3
              
                initial keycloak setup
              
              
                adwk67 eaf555c
              
                wip: jupyterhub + keycloak
              
              
                adwk67 34fb370
              
                wip
              
              
                adwk67 54fc0dc
              
                wip: certificates work but callback does not
              
              
                adwk67 ae34b6a
              
                wip: various tweaks
              
              
                adwk67 34c138d
              
                added some temp docs
              
              
                adwk67 05ad6d6
              
                add login info
              
              
                adwk67 e16da8c
              
                added some readme info
              
              
                adwk67 0f5dce1
              
                corrected ingress secret, set python cacert explicitly
              
              
                adwk67 0e3a28c
              
                Merge branch 'main' into feat/keycloak-jupyterhub
              
              
                adwk67 ca0c492
              
                wip: working version
              
              
                adwk67 f046dd8
              
                clean-up realm-config
              
              
                adwk67 e8eb2f9
              
                delegate user check to Keycloak
              
              
                adwk67 c1274e6
              
                use demo-specific keycloak
              
              
                adwk67 c41f309
              
                removed unnecessary settings
              
              
                adwk67 f6d22a9
              
                specify ports
              
              
                adwk67 697a0a8
              
                add jupyterhub.yaml to stack
              
              
                adwk67 396705f
              
                wip: working nb/spark combo
              
              
                adwk67 bc94e33
              
                read/write from s3
              
              
                adwk67 53132a8
              
                remove driver service resource in favour of the ones produced dynamic…
              
              
                adwk67 803e520
              
                use secret for minio credentials, add demo entry
              
              
                adwk67 bcfa3ae
              
                set endpoints via extra config
              
              
                adwk67 0ff07da
              
                mount notebook
              
              
                adwk67 9d431b5
              
                user-specific job name
              
              
                adwk67 79fdb3b
              
                add some notebook comments
              
              
                adwk67 b021d35
              
                typos and add password to stack
              
              
                adwk67 d3added
              
                first draft of demo docs
              
              
                adwk67 9c7298e
              
                typo, fixed title
              
              
                adwk67 7c497ee
              
                added hdfs write/read steps
              
              
                adwk67 573f812
              
                updated docs
              
              
                adwk67 884f0bf
              
                doc cleanup
              
              
                adwk67 44fad51
              
                Merge branch 'main' into feat/keycloak-jupyterhub
              
              
                adwk67 3d4484c
              
                Apply suggestions from code review
              
              
                adwk67 80bd2c6
              
                review suggestions: remove HDFS, improve docs and server options
              
              
                adwk67 44b0ecf
              
                Update docs/modules/demos/pages/jupyterhub-keycloak.adoc
              
              
                adwk67 49d47e0
              
                Update docs/modules/demos/pages/jupyterhub-keycloak.adoc
              
              
                adwk67 5a2c6cf
              
                Update docs/modules/demos/pages/jupyterhub-keycloak.adoc
              
              
                adwk67 0eb1ac7
              
                Update docs/modules/demos/pages/jupyterhub-keycloak.adoc
              
              
                adwk67 7be5288
              
                added a note about proxy reachability
              
              
                adwk67 File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| --- | ||
| apiVersion: batch/v1 | ||
| kind: Job | ||
| metadata: | ||
| name: load-gas-data | ||
| spec: | ||
| template: | ||
| spec: | ||
| containers: | ||
| - name: load-gas-data | ||
| image: "bitnami/minio:2022-debian-10" | ||
| command: ["bash", "-c", "cd /tmp; curl -O https://repo.stackable.tech/repository/misc/datasets/gas-sensor-data/20160930_203718.csv && mc --insecure alias set minio http://minio:9000/ $(cat /minio-s3-credentials/accessKey) $(cat /minio-s3-credentials/secretKey) && mc cp 20160930_203718.csv minio/demo/gas-sensor/raw/;"] | ||
| volumeMounts: | ||
| - name: minio-s3-credentials | ||
| mountPath: /minio-s3-credentials | ||
| volumes: | ||
| - name: minio-s3-credentials | ||
| secret: | ||
| secretName: minio-s3-credentials | ||
| restartPolicy: OnFailure | ||
| backoffLimit: 50 | 
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,189 @@ | ||
| = jupyterhub-keycloak | ||
|  | ||
| :k8s-cpu: https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu | ||
| :spark-pkg: https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html | ||
| :pyspark: https://spark.apache.org/docs/latest/api/python/getting_started/index.html | ||
| :jupyterhub-k8s: https://github.com/jupyterhub/zero-to-jupyterhub-k8s | ||
| :jupyterlab: https://jupyterlab.readthedocs.io/en/stable/ | ||
| :jupyter: https://jupyter.org | ||
| :keycloak: https://www.keycloak.org/ | ||
| :gas-sensor: https://archive.ics.uci.edu/dataset/487/gas+sensor+array+temperature+modulation | ||
|  | ||
| This demo showcases the integration between {jupyter}[JupyterHub] and {keycloak}[Keycloak] deployed on the Stackable Data Platform (SDP) onto a Kubernetes cluster. | ||
| {jupyterlab}[JupyterLab] is deployed using the {jupyterhub-k8s}[pyspark-notebook stack] provided by the Jupyter community. | ||
| A simple notebook is provided that shows how to start a distributed Spark cluster, reading and writing data from an S3 instance. | ||
|  | ||
| For this demo a small sample of {gas-sensor}[gas sensor measurements*] is provided. | ||
| Install this demo on an existing Kubernetes cluster: | ||
|  | ||
| [source,console] | ||
| ---- | ||
| $ stackablectl demo install jupyterhub-keycloak | ||
| ---- | ||
|  | ||
| WARNING: When running a distributed Spark cluster from within a JupyterHub notebook, the notebook acts as the driver and requests executors Pods from k8s. | ||
| These Pods in turn can mount *all* volumes and Secrets in that namespace. | ||
| To prevent this from breaking user separation, it is planned to use an OPA gatekeeper to define OPA rules that restrict what the created executor Pods can mount. This is not yet implemented in this demo. | ||
|  | ||
| [#system-requirements] | ||
| == System requirements | ||
|  | ||
| To run this demo, your system needs at least: | ||
|  | ||
| * 8 {k8s-cpu}[cpu units] (core/hyperthread) | ||
| * 32GiB memory | ||
| You may need more resources depending on how many concurrent users are logged in, and which notebook profiles they are using. | ||
|  | ||
| == Aim / Context | ||
|  | ||
| This demo shows how to authenticate JupyerHub users against a Keycloak backend using JupyterHub's OAuthenticator. | ||
| The same users as in the xref:end-to-end-security.adoc[End-to-end-security] demo are configured in Keycloak and these will be used as examples. | ||
| The notebook offers a simple template for using Spark to interact with S3 as a storage backend. | ||
|  | ||
| == Overview | ||
|  | ||
| This demo will: | ||
|  | ||
| * Install the required Stackable Data Platform operators | ||
| * Spin up the following data products: | ||
| ** *JupyterHub*: A multi-user server for Jupyter notebooks | ||
| ** *Keycloak*: An identity and access management product | ||
| ** *S3*: A Minio instance for data storage | ||
| * Download a sample of the gas sensor dataset into S3 | ||
| * Install the Jupyter notebook | ||
| * Demonstrate some basic data operations against S3 | ||
| * Illustrate multi-user usage | ||
|  | ||
| == JupyterHub | ||
|  | ||
| Have a look at the available Pods before logging in: | ||
|  | ||
| [source,console] | ||
| ---- | ||
| $ kubectl get pods | ||
| NAME READY STATUS RESTARTS AGE | ||
| hub-84f49ccbd7-29h7j 1/1 Running 0 56m | ||
| keycloak-544d757f57-f55kr 2/2 Running 0 57m | ||
| load-gas-data-m6z5p 0/1 Completed 0 54m | ||
| minio-5486d7584f-x2jn8 1/1 Running 0 57m | ||
| proxy-648bf7f45b-62vqg 1/1 Running 0 56m | ||
| ---- | ||
|  | ||
| The `proxy` Pod has an associated `proxy-public` service with a statically-defined port (31095), exposed with type NodePort. | ||
|         
                  adwk67 marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| In order to reach the JupyterHub web interface, navigate to this service. | ||
| The node port IP can be found in the ConfigMap `keycloak-address` (written by the Keycloak Deployment as it starts up). | ||
| In the example below that would then be 172.19.0.5:31095: | ||
|  | ||
| [source,yaml] | ||
| ---- | ||
| apiVersion: v1 | ||
| data: | ||
| keycloakAddress: 172.19.0.5:31093 # Keycloak itself | ||
| keycloakNodeIp: 172.19.0.5 # can be used to access the proxy-public service | ||
|         
                  maltesander marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: keycloak-address | ||
| namespace: default | ||
| ---- | ||
|  | ||
| NOTE: The `hub` Pod may show a `CreateContainerConfigError` for a few moments on start-up as it requires the ConfigMap written by the Keycloak deployment. | ||
|  | ||
| You should see the JupyterHub login page, which will indicate a re-direct to the OAuth service (Keycloak): | ||
|  | ||
| image::jupyterhub-keycloak/oauth-login.png[] | ||
|  | ||
| Click on the sign-in button. | ||
| You will be redirected to the Keycloak login, where you can enter one of the aforementioned users (e.g. `justin.martin` or `isla.williams`: the password is the same as the username): | ||
|  | ||
| image::jupyterhub-keycloak/keycloak-login.png[] | ||
|  | ||
| A successful login will redirect you back to JupyterHub where different profiles are listed (the drop-down options are visible when you click on the respective fields): | ||
|  | ||
| image::jupyterhub-keycloak/server-options.png[] | ||
|  | ||
| The explorer window on the left includes a notebook that is already mounted. | ||
|  | ||
| Double-click on the file `notebook/process-s3.ipynb`: | ||
|  | ||
| image::jupyterhub-keycloak/load-nb.png[] | ||
|  | ||
| Run the notebook by selecting "Run All Cells" from the menu: | ||
|  | ||
| image::jupyterhub-keycloak/run-nb.png[] | ||
|  | ||
| The notebook includes some comments regarding image compatibility and uses a custom image built off the official Spark image that matches the Spark version used in the notebook. | ||
| The java versions also match exactly. | ||
| Python versions need to match at the `major:minor` level, which is why Python 3.11 is used in the custom image. | ||
|  | ||
| Once the spark executor has been started (we have specified `spark.executor.instances` = 1) it will spin up as an extra pod. | ||
| We have named the spark job to incorporate the current user (justin-martin). | ||
| JupyterHub has started a pod for the user's notebook instance (`jupyter-justin-martin---bdd3b4a1`) and another one for the spark executor (`process-s3-jupyter-justin-martin-bdd3b4a1-9e9da995473f481f-exec-1`): | ||
|  | ||
| [source,console] | ||
| ---- | ||
| 12:49 $ kubectl get pods | ||
|         
                  adwk67 marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| NAME READY STATUS RESTARTS AGE | ||
| ... | ||
| jupyter-justin-martin---bdd3b4a1 1/1 Running 0 17m | ||
| process-s3-jupyter-justin-martin-... 1/1 Running 0 2m9s | ||
| ... | ||
| ---- | ||
|  | ||
| Stop the kernel in the notebook (which will shut down the spark session and thus the executor) and log out as the current user. | ||
| Log in now as `daniel.king` and then again as `isla.williams` (you may need to do this in a clean browser sessions so that existing login cookies are removed). | ||
| This user has been defined as an admin user in the jupyterhub configuration: | ||
|  | ||
| [source,yaml] | ||
| ---- | ||
| ... | ||
| hub: | ||
| config: | ||
| Authenticator: | ||
| # don't filter here: delegate to Keycloak | ||
| allow_all: True | ||
| admin_users: | ||
| - isla.williams | ||
| ... | ||
| ---- | ||
|  | ||
| You should now see user-specific pods for all three users: | ||
|  | ||
|  | ||
| [source,console] | ||
| ---- | ||
| 16:16 $ kubectl get pods | ||
|         
                  adwk67 marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| NAME READY STATUS RESTARTS AGE | ||
| ... | ||
| jupyter-daniel-king---181a80ce 1/1 Running 0 6m17s | ||
| jupyter-isla-williams---14730816 1/1 Running 0 4m50s | ||
| jupyter-justin-martin---bdd3b4a1 1/1 Running 0 3h47m | ||
| ... | ||
| ---- | ||
|  | ||
| The admin user (`isla.williams`) will also have an extra Admin tab in the JupyterHub console where current users can be managed. | ||
| You can find this in the JupyterHub UI at http://<ip>:31095/hub/admin e.g http://172.19.0.5:31095/hub/admin: | ||
|  | ||
| image::jupyterhub-keycloak/admin-tab.png[] | ||
|  | ||
| You can inspect the S3 buckets by using stackable stacklet list to return the Minio endpoint and logging in there with `admin/adminadmin`: | ||
|  | ||
| [source,console] | ||
| ---- | ||
| 15:15 $ stackablectl stacklet list | ||
|         
                  adwk67 marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| ┌─────────┬───────────────┬───────────┬───────────────────────────────┬────────────┐ | ||
| │ PRODUCT ┆ NAME ┆ NAMESPACE ┆ ENDPOINTS ┆ CONDITIONS │ | ||
| ╞═════════╪═══════════════╪═══════════╪═══════════════════════════════╪════════════╡ | ||
| │ minio ┆ minio-console ┆ default ┆ http http://172.19.0.5:32470 ┆ │ | ||
| └─────────┴───────────────┴───────────┴───────────────────────────────┴────────────┘ | ||
| ---- | ||
|  | ||
| image::jupyterhub-keycloak/s3-buckets.png[] | ||
|  | ||
| NOTE: if you attempt to re-run the notebook you will need to first remove the `_temporary folders` from the S3 buckets. | ||
| These are created by spark jobs and are not removed from the bucket when the job has completed. | ||
|  | ||
| *See: Burgués, Javier, Juan Manuel Jiménez-Soto, and Santiago Marco. "Estimation of the limit of detection in semiconductor gas sensors through linearized calibration models." Analytica chimica acta 1013 (2018): 13-25 | ||
| Burgués, Javier, and Santiago Marco. "Multivariate estimation of the limit of detection by orthogonal partial least squares in temperature-modulated MOX sensors." Analytica chimica acta 1019 (2018): 49-64. | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # docker build -t oci.stackable.tech/sandbox/spark:3.5.2-python311 -f Dockerfile . | ||
| # kind load docker-image oci.stackable.tech/sandbox/spark:3.5.2-python311 -n stackable-data-platform | ||
| # or: | ||
| # docker push oci.stackable.tech/sandbox/spark:3.5.2-python311 | ||
|  | ||
| FROM spark:3.5.2-scala2.12-java17-ubuntu | ||
|  | ||
| USER root | ||
|  | ||
| RUN set -ex; \ | ||
| apt-get update; \ | ||
| # Install dependencies for Python 3.11 | ||
| apt-get install -y \ | ||
| software-properties-common \ | ||
| && apt-get update && apt-get install -y \ | ||
| python3.11 \ | ||
| python3.11-venv \ | ||
| python3.11-dev \ | ||
| && rm -rf /var/lib/apt/lists/*; \ | ||
| # Install pip manually for Python 3.11 | ||
| curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \ | ||
| python3.11 get-pip.py && \ | ||
| rm get-pip.py | ||
|  | ||
| # Make Python 3.11 the default Python version | ||
| RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 \ | ||
| && update-alternatives --install /usr/bin/pip pip /usr/local/bin/pip3 1 | ||
|  | ||
| USER spark | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| --- | ||
| releaseName: jupyterhub | ||
| name: jupyterhub | ||
| repo: | ||
| name: jupyterhub | ||
| url: https://jupyterhub.github.io/helm-chart/ | ||
| version: 4.0.0 | ||
| options: | ||
| hub: | ||
| config: | ||
| Authenticator: | ||
| allow_all: True | ||
| admin_users: | ||
| - admin | ||
| JupyterHub: | ||
| authenticator_class: nativeauthenticator.NativeAuthenticator | ||
| NativeAuthenticator: | ||
| open_signup: true | ||
| proxy: | ||
| service: | ||
| type: ClusterIP | ||
| rbac: | ||
| create: true | ||
| prePuller: | ||
| hook: | ||
| enabled: false | ||
| continuous: | ||
| enabled: false | ||
| scheduling: | ||
| userScheduler: | ||
| enabled: false | ||
| singleuser: | ||
| cmd: null | ||
| serviceAccountName: hub | ||
| networkPolicy: | ||
| enabled: false | ||
| extraLabels: | ||
| stackable.tech/vendor: Stackable | ||
| profileList: | ||
| - display_name: "Default" | ||
| description: "Default profile" | ||
| default: true | ||
| profile_options: | ||
| cpu: | ||
| display_name: CPU | ||
| choices: | ||
| "2": | ||
| display_name: "2 request, 2 limit" | ||
| kubespawner_override: | ||
| cpu_guarantee: 2 | ||
| cpu_limit: 2 | ||
| "1 request, 16 limit": | ||
| display_name: "1 request, 16 limit" | ||
| kubespawner_override: | ||
| cpu_guarantee: 1 | ||
| cpu_limit: 16 | ||
| memory: | ||
| display_name: Memory | ||
| choices: | ||
| "8 GB": | ||
| display_name: "8 GB" | ||
| kubespawner_override: | ||
| mem_guarantee: "8G" | ||
| mem_limit: "8G" | ||
| image: | ||
| display_name: Image | ||
| choices: | ||
| "quay.io/jupyter/pyspark-notebook:python-3.11.9": | ||
| display_name: "quay.io/jupyter/pyspark-notebook:python-3.11.9" | ||
| kubespawner_override: | ||
| image: "quay.io/jupyter/pyspark-notebook:python-3.11.9" | 
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.