This repository provides templates and documentation for deploying Source-to-Image (S2I) enabled Jupyter Notebook images for Python on OpenShift.
The Jupyter project provides Docker-formatted container images via their GitHub project and on Docker Hub.
The images that the Jupyter project provides will not work with the default security profile of OpenShift. This is because the Jupyter project images, although they have attempted to set them up so they do not run as root
, will not run in a multi tenant PaaS environment where any container a user runs is forced to run with an assigned uid
different to that specified by the image.
The issues preventing the Jupyter project images being able to be run in a default installation of OpenShift have been reported via the GitHub project but at this point have not been addressed.
Fixed up versions of the Jupyter project images that can be run on OpenShift can be found in another getwarped
project described at:
The images based on the Jupyter project images, provide various stacks with a range of different packages pre-installed, including both Python 2 and Python 3 runtimes in some images. Those images do have basic S2I support for when using Python as well, but the primary intention is to make available an OpenShift compatible version of the Jupyter project images for use in ad-hoc situations.
In contrast, the images provided here are setup for only a single Python version and only provide a minimal notebook configuration. The intent is that the images here be used as S2I builders to build up purpose built images which include only those packages that you need. This provides a smaller and more reproducible image as it requires that all the dependencies you need are listed.
The images here also include built-in support for being used as part of a parallel computing cluster using ipyparallel
. The Jupyter project images cannot be used in the same way and would need additional work to set them up to be used with ipyparallel
.
So if you want an ad-hoc environment to play in, the Jupyter project images may be more suitable, but where you need your environment to be properly specified so you know what is being included, or need to use ipyparallel
, the images here would be a better option.
Two Jupyter notebook base images are currently being made available:
- getwarped/s2i-notebook-python27 - (GitHub Repository, Docker Hub)
- getwarped/s2i-notebook-python35 - (GitHub Repository, Docker Hub)
The images include only the basic Python packages required to run a Jupyter notebook server, as well as the ipyparallel
package for parallel computing.
The images are Source-to-Image (S2I) enabled, and use waprdrive
to make it easy to build up custom images including additional Python packages or code, and sample notebook files and data.
The easiest way to deploy the Jupyter notebook images and get working is to import which of the above images you wish to use and then use the OpenShift templates provided with this project to deploy them.
To import all the above images into your project, you can run the oc import-image
command.
oc import-image getwarped/s2i-notebook-python27 --confirm
oc import-image getwarped/s2i-notebook-python35 --confirm
This should result in the creation of two image streams within your project.
s2i-notebook-python27
s2i-notebook-python35
You could deploy the images directly, but the templates provide an easier way of setting a password for your notebooks, and will also ensure that a secure HTTP connection is used when interacting with the notebook interface.
To load the OpenShift templates you can run the oc create
command.
oc create -f https://raw.githubusercontent.com/getwarped/jupyter-notebooks/master/openshift/templates.json
This should create three templates within your project. The purpose of each template is as follows:
-
jupyter-notebook
- Deploy a notebook server from an image stream. This can be one of the basic images listed above, or a customised image which has been created using thejupyter-builder
template. The notebook can optionally be linked to a parallel compute cluster created usingjupyter-cluster
. -
jupyter-builder
- Create a customised notebook image. This will run the S2I build process, starting with any of the basic images, or even a customised image, to bind additional files into the image. This can be used to incorporate pre-defined notebooks, data files, or install additional Python packages. -
jupyter-cluster
- Deploy a parallel compute cluster comprising of a controller and single compute engine. The number of compute engines can be scaled up to as many as necessary.
To deploy a notebook server, select Add to Project from the web console and enter jupyter
into the search filter. Select jupyter-notebook
.
On the page for the jupyter-notebook
template fill in the name of the application, the name of the image you wish to deploy (defaults to s2i-notebook-python27
) and a password for the notebook server. Also override the app
label applied to resources created when deploying the notebook server with a unique name. This will make it easier to delete the notebook server later.
If you do not provide your own password, a random password will be used. You can find out the name of the generated password by going to the Environment tab of the Deployments page for your notebook server in the OpenShift web console.
Once created the notebook server will be deployed and automatically exposed via a route to the Internet over a secure HTTP connection. You can find the URL it is exposed as from the Overview page for your project.
When you visit the URL you will be prompted for the password you entered via the template to get access to the notebook server.
If you use either of the s2i-notebook-python27
or s2i-notebook-python35
images in the NOTEBOOK_IMAGE
field of the template, you will be presented with an empty work directory. You can create new notebooks or upload your own as necessary.
Do note that by default any work you do is not being saved in a persistent volume. Thus if the notebook server is restarted at any point by you explicitly, or by OpenShift, your work will be lost.
To enable you to preserve your work, even if the notebook server is restarted, you should attach a persistent storage volume to the notebook server. This can be done from the Deployments page for your notebook server.
The mount path for the storage should be set to be /opt/app-root/src
.
The storage should be attached before you do anything else. The notebook server will be automatically restarted when you make the storage request and attach the volume.
When done with your work, you can download your files using the Jupyter notebook interface. Alternatively, you can use the oc rsync
command to copy files from your application back to your local computer.
To delete the Jupyter notebook server when no longer required, you can use the oc delete all --selector app=<name>
command, where <name>
is replaced with the value you gave the app
label in the template page when deploying the Jupyter notebook server.
Note that when the application is deleted, the persistent storage volume will not be deleted. To delete that you should determine the name of the persistent storage volume using oc get pvc
and then delete it using oc delete pvc/<name>
.
Only a basic Jupyter notebook image is provided. If your Jupyter notebooks require additional Python packages you can create a customised image using the oc new-build
command, or the jupyter-builder
template.
From the web console, select jupyter-builder
from the Add to Project page. You will be asked to enter in the name to give the custom notebook image created, the name of the S2I builder image to use as the base, and the details of a source code repository from which files will be pulled to incorporate into the custom image. To have additional Python packages installed into the custom image, you should add a requirements.txt
for pip
into the directory of the source code repository that files are pulled from.
This template will only create a build and it will not appear on the Overview page. You will be able to find it under the Builds page.
When the build is complete, you can then use the jupyter-notebook
template to run the image, as described above, by setting NOTEBOOK_IMAGE
to be the name of your custom image.
Support for running a parallel computing cluster using ipyparallel
has been incorporated into all the S2I enabled images.
To set up your own compute cluster, you can use the jupyter-cluster
template. Give your cluster a name and specify the notebook image to use when starting the controller and compute engines.
If you need additional Python packages, data files or Python code available in the compute engines, you can build it into a custom image to be used when starting up the cluster using the jupyter-builder
template.
Once the cluster is running, you can increase the number of compute engines by increasing the replica count on the ipengine
component.
To link the cluster with a Jupyter notebook server, specify the name of the cluster when deploying the notebook server using the jupyter-notebook
template.
The cluster will be associated with the default profile, so the ipyparallel
client can be used without needing any special arguments when it is initialised.