Skip to content

Commit a5cee92

Browse files
committed
[CTM-142] New terra base image with sudo
1 parent ca0b855 commit a5cee92

34 files changed

+3626
-21
lines changed

README.md

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ This repo provides docker images for running jupyter notebook in [Terra](https:/
55
Make sure to go through the [contributing guide](https://github.com/DataBiosphere/terra-docker/blob/master/CONTRIBUTING.md#contributing) as you make changes to this repo.
66

77
# Terra Base Images
8-
[terra-jupyter-base](terra-jupyter-base/README.md)
8+
[terra-base](terra-base/README.md)
99

1010
[terra-jupyter-python](terra-jupyter-python/README.md)
1111

@@ -20,15 +20,18 @@ Make sure to go through the [contributing guide](https://github.com/DataBiospher
2020
# How to create your own Custom image to use with notebooks on Terra
2121
Custom docker images need to use a Terra base image (see above) in order to work with the service that runs notebooks on Terra.
2222
* You can use any of the base images above
23+
* `terra-base` is the smallest image, but doesn't include any scientific packages on top of Jupyter and R
2324
* Here is an example of how to build off of a base image: Add `FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:0.0.1` to your dockerfile (`terra-jupyter-base` is the smallest image you can extend from)
2425
* Customize your image (see the [terra-jupyter-python](terra-jupyter-python/Dockerfile) dockerfile for an example of how to extend from one of our base images
25-
* Publish the image to either GCR or Dockerhub; the image must be public to be used
26+
* Publish the image to either GAR or Dockerhub;
27+
* If using Dockerhub, the image **must be public** to be used
2628
* Use the published container image location when creating notebook runtime
2729
* Dockerhub image example: [image name]:[tag]
28-
* GCR image example: us.gcr.io/repository/[image name]:[tag]
29-
* Since 6/28/2021, we introduced a few changes that might impact building custom images
30-
- Home directory of new images will be `/home/jupyter`. This means if your dockerfile is referencing `/home/jupyter-user` directory, you need to update it to $HOME (recommended) or `/home/jupyter`.
31-
- Creating VMs with custom images will take much longer than terra supported images because `docker pull` will take a few min. If the custom image ends up being too large, VM creation may time out. New base images are much larger in size than previous versions.
30+
* GAR image example: us.gcr.io/repository/[image name]:[tag]
31+
* Some things to keep in mind when creating custom images:
32+
- The home directory of new images will be `/home/jupyter`. This means if your dockerfile is referencing the `/home/jupyter-user` directory, you need to update it to $HOME (recommended) or `/home/jupyter`.
33+
- Creating VMs with custom images may take longer than terra supported images because `docker pull` will take a few min. If the custom image ends up being too large, VM creation may time out.
34+
-
3235

3336
# Development
3437
## Using git secrets
@@ -56,25 +59,25 @@ Once you have the container running, you should be able to access jupyter at htt
5659
Detailed documentation on how to integrate the terra-docker image with Leonardo can be found [here](https://broadworkbench.atlassian.net/wiki/spaces/IA/pages/2519564289/Integrating+new+Terra+docker+images+with+Leonardo)
5760

5861
### If you are adding a new image:
59-
- Create a new directory with the Dockerfile and a CHANGELOG.md.
62+
- Create a new directory with the Dockerfile and a CHANGELOG.md.
6063
- Add the directory name (also referred to as the image name) as an entry to the image_data array in the file in config/conf.json. For more info on what is needed for a new image, see the section on the config
6164
- If you wish the image to be baked into our custom image, which makes the runtime load significantly faster (recommended), make a PR into the leonardo [repo](https://github.com/DataBiosphere/leonardo) doing the following within the `jenkins` folder:
62-
- Add the image to the parameter list in the Jenkinsfile
63-
- Update the relevant `prepare` script in each subdirectory. Currently there is a prepare script for gce and dataproc.
64-
- It is recommended to add a test in the `automation` directory (`automation/src/test/resources/reference.conf`)
65-
- Add your image to the `reference.conf` in the automation directory. This will be the only place any future version updates to your image happen. This ensures, along with the test in the previous step, that any changes to the image are tested.
66-
- Run the GHA to generate the image, and add it to `reference.conf` in the http directory (`http/src/main/resources/reference.conf`)
65+
- Add the image to the parameter list in the Jenkinsfile
66+
- Update the relevant `prepare` script in each subdirectory. Currently there is a prepare script for gce and dataproc.
67+
- It is recommended to add a test in the `automation` directory (`automation/src/test/resources/reference.conf`)
68+
- Add your image to the `reference.conf` in the automation directory. This will be the only place any future version updates to your image happen. This ensures, along with the test in the previous step, that any changes to the image are tested.
69+
- Run the GHA to generate the image, and add it to `reference.conf` in the http directory (`http/src/main/resources/reference.conf`)
6770

6871
### If you are updating an existing image:
6972
- [Create your terra-docker PR](https://broadworkbench.atlassian.net/wiki/spaces/IA/pages/2519564289/Integrating+new+Terra+docker+images+with+Leonardo#1.-Create-a-terra-docker-PR)
70-
- Update the version in config/conf.json
71-
- Update CHANGELOG.md and VERSION file
72-
- Ensure that no `From` statements need to be updated based on the image you updated (i.e., if you update the base image, you will need to update several other images)
73-
- Run updateVersions.sc to bump all images dependent on the base
73+
- Update the version in config/conf.json
74+
- Update CHANGELOG.md and VERSION file
75+
- Ensure that no `From` statements need to be updated based on the image you updated (i.e., if you update the base image, you will need to update several other images)
76+
- Run updateVersions.sc to bump all images dependent on the base
7477
- [Merge your terra-docker PR and check if the image(s) and version json files are created](https://broadworkbench.atlassian.net/wiki/spaces/IA/pages/2519564289/Integrating+new+Terra+docker+images+with+Leonardo#2.-Merge-your-terra-docker-PR-and-check-images-are-created)
7578
- [Open a PR in leonardo](https://broadworkbench.atlassian.net/wiki/spaces/IA/pages/2519564289/Integrating+new+Terra+docker+images+with+Leonardo#3.-Create-a-new-leo-PR-that-integrates-the-new-images)
76-
- Update the relevant `prepare` script within the `jenkins` folder
77-
- Update the automation `reference.conf` file
79+
- Update the relevant `prepare` script within the `jenkins` folder
80+
- Update the automation `reference.conf` file
7881
- [Run the GHA on your branch to generate the new image](https://broadworkbench.atlassian.net/wiki/spaces/IA/pages/2519564289/Integrating+new+Terra+docker+images+with+Leonardo#4.-Run-the-Github-Action-in-leo-to-generate-a-new-custom-COS-image)
7982
- [Update the leonardo PR to use the newly generated image](https://broadworkbench.atlassian.net/wiki/spaces/IA/pages/2519564289/Integrating+new+Terra+docker+images+with+Leonardo#5.-Update-the-Leo-PR-to-use-the-generated-OS-images)
8083
- Ensure that the `terra-docker-versions-candidate.json` file (which is what the UI sources the dropdown from) in the `terra-docjker-image-documentation-[env]` bucket correclty references your new docker image
@@ -120,9 +123,9 @@ To launch an image through Terra, navigate to https://app.terra.bio or your BEE'
120123

121124
## Config
122125

123-
There is a config file located at `config/conf.json` that contains the configuration used by all automated jobs and build scripts that interface with this repo.
126+
There is a config file located at `config/conf.json` that contains the configuration used by all automated jobs and build scripts that interface with this repo.
124127

125-
There is a field for "spark_version" top-level which must be updated if we update the debian version used in the custom image.
128+
There is a field for "spark_version" top-level which must be updated if we update the debian version used in the custom image.
126129
Currently it assumes 1.4x https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-1.4
127130

128131
There are some constants included, such as the tools supported by this repo. Of particular interest is the image_data array.
@@ -163,7 +166,7 @@ Each time you update or add an image, you will need to update the appropriate en
163166

164167
The scripts folder has scripts used for building.
165168
- `generate_package_docs.py` This script is run once by build.sh each time an image is built. It is used to generate a .json with the versions for the packages in the image.
166-
- `generate_version_docs.py` This script is run each time an image is built. It builds a new file master version file for the UI to look up the current versions to reference.
169+
- `generate_version_docs.py` This script is run each time an image is built. It builds a new file master version file for the UI to look up the current versions to reference.
167170

168171
## Image dependencies
169172

terra-base/Dockerfile

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
# Latest gpu-enabled base image on Ubuntu 22, 132 MB compressed
2+
FROM --platform=linux/amd64 nvidia/cuda:13.0.1-base-ubuntu22.04
3+
4+
LABEL maintainer="DSP Analysis Team <dsp-analysis@broadinstitute.org>"
5+
6+
# TODO:
7+
# ! try moving conda setup before UV setup
8+
# ! specify that UV uses the conda python
9+
# add more comments/clearer naming around the python versions being used
10+
# double check that the correct CUDA/GPU drivers are available, otherwise copy from old base image via a multi-stage build
11+
# look into copying over apt-get packages from the old terra base image in next level of images
12+
# have examples of custom images extending this base image for different use cases (e.g. R, GATK, etc)
13+
14+
# want the command to fail due to an error at any stage in the pipe: https://github.com/hadolint/hadolint/wiki/DL4006
15+
SHELL ["/usr/bin/bash", "-o", "pipefail", "-c"]
16+
17+
#######################
18+
# General Environment Variables
19+
#######################
20+
ENV DEBIAN_FRONTEND=noninteractive
21+
ENV LC_ALL=en_US.UTF-8
22+
23+
# Version of python to be installed and used
24+
ENV PYTHON_VERSION=3.10
25+
# Paired conda installer
26+
ENV CONDA_INSTALLER=https://repo.anaconda.com/miniconda/Miniconda3-py310_25.9.1-1-Linux-x86_64.sh
27+
ENV JUPYTER_VERSION=5.7.2
28+
ENV NODE_MAJOR=20
29+
30+
#################
31+
# Install Prerequisites
32+
#################
33+
RUN apt-get update && apt-get install -yq --no-install-recommends \
34+
# basic necessities
35+
sudo \
36+
ca-certificates \
37+
curl \
38+
jq \
39+
# gnupg requirement
40+
gnupg \
41+
dirmngr \
42+
# useful utilities for debugging within docker itself \
43+
nano \
44+
less \
45+
procps \
46+
lsb-release \
47+
# gcc compiler
48+
build-essential \
49+
locales \
50+
# for ssh-agent and ssh-add
51+
keychain \
52+
# extras \
53+
wget \
54+
bzip2 \
55+
git \
56+
# Uncomment en_US.UTF-8 for inclusion in generation
57+
&& sed -i 's/^# *\(en_US.UTF-8\)/\1/' /etc/locale.gen \
58+
# Generate locale
59+
&& locale-gen \
60+
# cleanup
61+
&& apt-get clean \
62+
&& rm -rf /var/lib/apt/lists/*
63+
64+
##############################
65+
# Set up Node for Jupyterlab
66+
##############################
67+
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
68+
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
69+
70+
# Install Node >18
71+
RUN apt-get update && apt-get install -yq --no-install-recommends
72+
RUN mkdir -p /etc/apt/keyrings
73+
RUN curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
74+
75+
RUN echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list
76+
RUN dpkg --remove --force-remove-reinstreq libnode-dev
77+
RUN apt-get update && apt-get install -f -yq nodejs
78+
79+
##########################
80+
# Create the User's Jupyter User
81+
##########################
82+
# Create the jupyter user and give sudo permission
83+
ENV USER=jupyter
84+
# This UID must stay static to be one greater than the welder user (1001)
85+
ENV USER_UID=1002
86+
87+
# Create the user home and add to the users group
88+
ENV USER_HOME=/home/$USER
89+
RUN useradd -m -s /bin/bash -d $USER_HOME -N -u $USER_UID $USER
90+
RUN usermod -g users $USER
91+
92+
# We want to grant the user sudo permissions without password
93+
# so they can install the necessary packages that they want to use on the docker container
94+
RUN echo "$USER ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/$USER \
95+
&& chmod 0440 /etc/sudoers.d/$USER
96+
97+
############
98+
# Install R
99+
############
100+
# add R repo for later R package installation \
101+
RUN wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc \
102+
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
103+
&& apt-get update && apt-get install -yq --no-install-recommends software-properties-common \
104+
&& add-apt-repository --no-update "deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/" \
105+
# Install R base
106+
&& apt-get update && apt-get install -y r-base \
107+
&& apt-get clean \
108+
&& rm -rf /var/lib/apt/lists/*
109+
110+
############################################
111+
# Install Miniconda and setup base conda environment
112+
############################################
113+
## CONDA should not be used by devs to manage package dependencies,
114+
# but is a widely used tool to manage python environments in a runtime
115+
# and we should provide it to users
116+
117+
# download and install conda
118+
ENV CONDA_HOME=/opt/conda
119+
120+
RUN curl -so $HOME/miniconda.sh $CONDA_INSTALLER \
121+
&& chmod +x $HOME/miniconda.sh \
122+
&& $HOME/miniconda.sh -b -p $CONDA_HOME \
123+
&& rm $HOME/miniconda.sh
124+
# ENV PATH="${PATH}:${CONDA_ENV_HOME}/bin:${CONDA_HOME}/bin"
125+
ENV PATH="${PATH}:${CONDA_HOME}/bin"
126+
127+
# In order to override the default kernel, the conda environment must be named 'python3'
128+
ENV CONDA_ENV_NAME=python3
129+
ENV CONDA_ENV_HOME=$USER_HOME/.envs/$CONDA_ENV_NAME
130+
ENV CONDA_FILES=$CONDA_HOME/custom
131+
132+
# Copy over the conda environment files
133+
COPY conda/ $CONDA_FILES
134+
ENV CONDA_FILES=$CONDA_HOME/custom
135+
136+
# Set up the path to the user python from conda
137+
ENV BASE_PYTHON_PATH=$CONDA_HOME/bin/python$PYTHON_VERSION
138+
# Tell conda to NOT write bite code (aka these.pyc files)
139+
ENV PYTHONDONTWRITEBYTECODE=true
140+
141+
# Have to accept the conda terms of service to be able to install packages
142+
RUN conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main \
143+
&& conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
144+
145+
#RUN conda init bash \
146+
# && . ~/.bashrc \
147+
RUN conda env create --prefix $CONDA_ENV_HOME --file $CONDA_FILES/conda-environment.yaml \
148+
# Remove packages tarballs and python bytecode files from the image
149+
&& conda clean -afy \
150+
&& rm $CONDA_FILES/conda-environment.yaml \
151+
# Make sure the USER is the owner of the folder where the base conda is installed
152+
&& chown -R $USER:users $USER_HOME
153+
154+
# Create the base conda environment that will be used by the jupyter user
155+
RUN conda run -p ${CONDA_ENV_HOME} python -m ipykernel install --name=$CONDA_ENV_NAME
156+
157+
158+
##############################
159+
# Setup UV Environment for Jupyter
160+
##############################
161+
# Using UV (Universal Virtualenv) to create a virtual environment
162+
# UV is used in place of poetry for speed and simplicity.
163+
164+
# NOTE: this is for the environment the jupyter server will be running in,
165+
# separate from the jupyter user's conda environment.
166+
COPY uv.lock .
167+
COPY pyproject.toml .
168+
169+
# Setup UV environment variables
170+
# - tells uv to copy the Python files into the container from the cache mount,
171+
# - tell uv to byte-compile packages for faster application startups,
172+
# - don't seed venv, we need to install separately
173+
# - don't cache to keep the image size small
174+
ENV UV_LINK_MODE=copy \
175+
UV_COMPILE_BYTECODE=1 \
176+
UV_VENV_SEED=false \
177+
UV_NO_CACHE=true
178+
179+
# Download the latest installer
180+
ADD https://astral.sh/uv/install.sh /uv-installer.sh
181+
RUN sh /uv-installer.sh && rm /uv-installer.sh
182+
183+
# Add local bin to PATH for uv
184+
ENV PATH="/root/.local/bin/:$PATH"
185+
186+
ENV JUPYTER_HOME=/etc/jupyter
187+
188+
# Create a virtual environment and install jupyter packages
189+
# setuptools and wheel are required for some of the jupyter extensions
190+
RUN uv venv $JUPYTER_HOME --python $BASE_PYTHON_PATH \
191+
&& source $JUPYTER_HOME/bin/activate \
192+
&& uv pip install wheel \
193+
&& uv pip install setuptools \
194+
&& uv pip install -r pyproject.toml --no-cache --no-build-isolation \
195+
# Cleanup
196+
&& rm uv.lock && rm pyproject.toml
197+
198+
# add jupyter to path
199+
ENV PATH="${PATH}:${JUPYTER_HOME}/bin"
200+
201+
# Remove default jupyter kernel (to force use of the conda python kernel)
202+
RUN $JUPYTER_HOME/bin/jupyter kernelspec remove python3 -y
203+
204+
# #######################
205+
# # Terra-specific Utilities
206+
# #######################
207+
# copy over jupyter extensions and kernel config files
208+
COPY jupyter/ $JUPYTER_HOME
209+
210+
# give user ownership of jupyter home and conda env home
211+
# and make extension files executable
212+
RUN chown -R $USER:users $JUPYTER_HOME $CONDA_HOME \
213+
&& find $JUPYTER_HOME/extensions/scripts -name '*.sh' -type f | xargs chmod +x
214+
215+
# make the run-jupyter script executable
216+
RUN chmod +x -R $JUPYTER_HOME/run-jupyter.sh
217+
218+
# Setup jupyter and r kernels
219+
ENV JUPYTER_KERNELSPEC_DIR=/usr/local/share/jupyter/
220+
RUN chown -R $USER:users $JUPYTER_KERNELSPEC_DIR \
221+
&& find $JUPYTER_HOME/kernel -name '*.sh' -type f | xargs chmod +x \
222+
# You can get kernel directory by running `jupyter kernelspec list`
223+
&& $JUPYTER_HOME/kernel/kernelspec.sh $JUPYTER_HOME/kernel $JUPYTER_KERNELSPEC_DIR/kernels
224+
225+
# Create the welder user
226+
# The welder uid is consistent with the Welder docker definition here:
227+
# https://github.com/DataBiosphere/welder/blob/master/project/Settings.scala
228+
# Adding welder-user to the Jupyter container isn't strictly required, but it makes welder-added
229+
# files display nicer when viewed in a terminal.
230+
ENV WELDER_USER welder-user
231+
# This UID must stay consistent with the UID defined in Welder
232+
ENV WELDER_UID 1001
233+
RUN useradd -m -s /bin/bash -N -u $WELDER_UID $WELDER_USER
234+
235+
# Make sure that the jupyter user will have access to the jupyter path in the working directory
236+
EXPOSE $JUPYTER_PORT
237+
WORKDIR $USER_HOME
238+
239+
# make pip install to a user directory, instead of a system directory which requires root.
240+
# this is useful so `pip install` commands can be run in the context of a notebook.
241+
ENV PIP_USER=true
242+
USER $USER
243+
244+
# Note: this entrypoint is provided for running Jupyter independently of Leonardo.
245+
# When Leonardo deploys this image onto a cluster, the entrypoint is overwritten to enable
246+
# additional setup inside the container before execution. Jupyter execution occurs when the
247+
# init-actions.sh script uses 'docker exec' to call run-jupyter.sh.
248+
ENTRYPOINT ["/etc/jupyter/bin/jupyter", "notebook"]

0 commit comments

Comments
 (0)