Skip to content

Commit 545e4be

Browse files
authored
chore(cuda): Upgrade all cuda-related IDE workbenches to CUDA 12.6 (#949)
* chore(cuda): Upgrade all cuda-related IDE workbences to CUDA 12.6 As part of the `2025a` release - we want to bump CUDA to the (latest) `12.6.3` version. This work builds upon the excellent investigation of @daniellutz to bring the upgrade to fruition. A couple important points to note: - Previously RStudio and Jupyter were on different version of CUDA (`12.1` and `12.4` respectively). This change standardizes the version to `12.6` across the IDEs - The `.repo` and licence file previously stored in `cuda` subfolders for `ubi9` and `c9s` are identical. To avoid unnecessary duplication - the files are now at the root of the `cuda/` directory - and sub-folders have been removed. - There is some uncertainty on whether or not `ENV XLA_FLAGS` needs to be defined. For now, for consistency, it is always placed as the last instruction in the `cuda-base` Docker stage prior to restoring `USER` + `WORKDIR` to desired values. Related-to: https://issues.redhat.com/browse/RHOAIENG-19480 * chore(deps): upgrade Pipfile dependencies for cuda-related images * fix(test): working on verifying CUDA behavior from upgrade this commit SHOULD NOT (necessarily) be committed.. will need rework to make sure changes don't disrupt normal development workflow * feat(kustomize): provide kustomize configs for cuda manifests * chore(manifest): update imagestreams for CUDA 2025a Note the SHA reference presently invalid/copied over from 2024.2 * fix(tests): get tests running clean * fix(pr): address PR comments * fix(rebase): tidy up work after picking up changes from main
1 parent 6f701fe commit 545e4be

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1311
-1409
lines changed

ci/check-params-env.sh

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -113,27 +113,27 @@ function check_image_variable_matches_name_and_commitref_and_size() {
113113
;;
114114
odh-minimal-gpu-notebook-image-n)
115115
expected_name="odh-notebook-jupyter-minimal-ubi9-python-3.11"
116-
expected_commitref="2024b"
116+
expected_commitref="main"
117117
expected_build_name="cuda-jupyter-minimal-ubi9-python-3.11-amd64"
118118
expected_img_size=5157
119119
;;
120120
odh-minimal-gpu-notebook-image-n-1)
121-
expected_name="odh-notebook-jupyter-minimal-ubi9-python-3.9"
122-
expected_commitref="2024a"
123-
expected_build_name="cuda-jupyter-minimal-ubi9-python-3.9-amd64"
124-
expected_img_size=5718
121+
expected_name="odh-notebook-jupyter-minimal-ubi9-python-3.11"
122+
expected_commitref="2024b"
123+
expected_build_name="cuda-jupyter-minimal-ubi9-python-3.11-amd64"
124+
expected_img_size=5157
125125
;;
126126
odh-pytorch-gpu-notebook-image-n)
127127
expected_name="odh-notebook-jupyter-pytorch-ubi9-python-3.11"
128-
expected_commitref="2024b"
128+
expected_commitref="main"
129129
expected_build_name="jupyter-pytorch-ubi9-python-3.11-amd64"
130130
expected_img_size=8571
131131
;;
132132
odh-pytorch-gpu-notebook-image-n-1)
133-
expected_name="odh-notebook-jupyter-pytorch-ubi9-python-3.9"
134-
expected_commitref="2024a"
135-
expected_build_name="jupyter-pytorch-ubi9-python-3.9-amd64"
136-
expected_img_size=9037
133+
expected_name="odh-notebook-jupyter-pytorch-ubi9-python-3.11"
134+
expected_commitref="2024b"
135+
expected_build_name="jupyter-pytorch-ubi9-python-3.11-amd64"
136+
expected_img_size=8571
137137
;;
138138
odh-generic-data-science-notebook-image-n)
139139
expected_name="odh-notebook-jupyter-datascience-ubi9-python-3.11"
@@ -149,15 +149,15 @@ function check_image_variable_matches_name_and_commitref_and_size() {
149149
;;
150150
odh-tensorflow-gpu-notebook-image-n)
151151
expected_name="odh-notebook-cuda-jupyter-tensorflow-ubi9-python-3.11"
152-
expected_commitref="2024b"
152+
expected_commitref="main"
153153
expected_build_name="cuda-jupyter-tensorflow-ubi9-python-3.11-amd64"
154154
expected_img_size=8211
155155
;;
156156
odh-tensorflow-gpu-notebook-image-n-1)
157-
expected_name="odh-notebook-cuda-jupyter-tensorflow-ubi9-python-3.9"
158-
expected_commitref="2024a"
159-
expected_build_name="cuda-jupyter-tensorflow-ubi9-python-3.9-amd64"
160-
expected_img_size=6667
157+
expected_name="odh-notebook-cuda-jupyter-tensorflow-ubi9-python-3.11"
158+
expected_commitref="2024b"
159+
expected_build_name="cuda-jupyter-tensorflow-ubi9-python-3.11-amd64"
160+
expected_img_size=8211
161161
;;
162162
odh-trustyai-notebook-image-n)
163163
expected_name="odh-notebook-jupyter-trustyai-ubi9-python-3.11"
@@ -200,15 +200,15 @@ function check_image_variable_matches_name_and_commitref_and_size() {
200200
# We should consider what to do with this - in ideal case, we should have different labels for these cases.
201201
odh-rstudio-gpu-notebook-image-n)
202202
expected_name="odh-notebook-rstudio-server-c9s-python-3.11"
203-
expected_commitref="2024b"
203+
expected_commitref="main"
204204
expected_build_name="cuda-rstudio-c9s-python-3.11-amd64"
205205
expected_img_size=7184
206206
;;
207207
odh-rstudio-gpu-notebook-image-n-1)
208-
expected_name="odh-notebook-rstudio-server-c9s-python-3.9"
209-
expected_commitref="2024a"
210-
expected_build_name="cuda-rstudio-c9s-python-3.9-amd64"
211-
expected_img_size=7129
208+
expected_name="odh-notebook-rstudio-server-c9s-python-3.11"
209+
expected_commitref="2024b"
210+
expected_build_name="cuda-rstudio-c9s-python-3.11-amd64"
211+
expected_img_size=7184
212212
;;
213213
odh-rocm-minimal-notebook-image-n)
214214
expected_name="odh-notebook-jupyter-minimal-ubi9-python-3.11"

cuda/c9s-python-3.11/cuda.repo-arm64

Lines changed: 0 additions & 6 deletions
This file was deleted.
File renamed without changes.

cuda/ubi9-python-3.11/NGC-DL-CONTAINER-LICENSE

Lines changed: 0 additions & 230 deletions
This file was deleted.

cuda/ubi9-python-3.11/cuda.repo-x86_64

Lines changed: 0 additions & 6 deletions
This file was deleted.

jupyter/minimal/ubi9-python-3.11/Dockerfile.cuda

Lines changed: 45 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -28,16 +28,16 @@ RUN curl -L https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/oc
2828
####################
2929
FROM base AS cuda-base
3030

31-
ARG CUDA_SOURCE_CODE=cuda/ubi9-python-3.11
31+
ARG CUDA_SOURCE_CODE=cuda
3232

3333
# Install CUDA base from:
34-
# https://gitlab.com/nvidia/container-images/cuda/-/tree/master/dist/12.4.1/ubi9/base
34+
# https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/12.6.3/ubi9/base/Dockerfile
3535
USER 0
3636
WORKDIR /opt/app-root/bin
3737

3838
ENV NVARCH=x86_64
39-
ENV NVIDIA_REQUIRE_CUDA="cuda>=12.4 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536"
40-
ENV NV_CUDA_CUDART_VERSION=12.4.127-1
39+
ENV NVIDIA_REQUIRE_CUDA="cuda>=12.6 brand=unknown,driver>=470,driver<471 brand=grid,driver>=470,driver<471 brand=tesla,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=vapps,driver>=470,driver<471 brand=vpc,driver>=470,driver<471 brand=vcs,driver>=470,driver<471 brand=vws,driver>=470,driver<471 brand=cloudgaming,driver>=470,driver<471 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551"
40+
ENV NV_CUDA_CUDART_VERSION=12.6.77-1
4141

4242
COPY ${CUDA_SOURCE_CODE}/cuda.repo-x86_64 /etc/yum.repos.d/cuda.repo
4343
COPY ${CUDA_SOURCE_CODE}/NGC-DL-CONTAINER-LICENSE /
@@ -46,12 +46,12 @@ RUN NVIDIA_GPGKEY_SUM=d0664fbbdb8c32356d45de36c5984617217b2d0bef41b93ccecd326ba3
4646
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/rhel9/${NVARCH}/D42D0685.pub | sed '/^Version/d' > /etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA && \
4747
echo "$NVIDIA_GPGKEY_SUM /etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA" | sha256sum -c --strict -
4848

49-
ENV CUDA_VERSION=12.4.1
49+
ENV CUDA_VERSION=12.6.3
5050

5151
# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
5252
RUN yum upgrade -y && yum install -y \
53-
cuda-cudart-12-4-${NV_CUDA_CUDART_VERSION} \
54-
cuda-compat-12-4 \
53+
cuda-cudart-12-6-${NV_CUDA_CUDART_VERSION} \
54+
cuda-compat-12-6 \
5555
&& yum clean all \
5656
&& rm -rf /var/cache/yum/*
5757

@@ -67,55 +67,53 @@ ENV NVIDIA_VISIBLE_DEVICES=all
6767
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
6868

6969
# Install CUDA runtime from:
70-
# https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/12.4.1/ubi9/runtime/Dockerfile
71-
ENV NV_CUDA_LIB_VERSION=12.4.1-1
72-
ENV NV_NVTX_VERSION=12.4.127-1
73-
ENV NV_LIBNPP_VERSION=12.2.5.30-1
74-
ENV NV_LIBNPP_PACKAGE=libnpp-12-4-${NV_LIBNPP_VERSION}
75-
ENV NV_LIBCUBLAS_VERSION=12.4.5.8-1
70+
# https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/12.6.3/ubi9/runtime/Dockerfile
71+
ENV NV_CUDA_LIB_VERSION=12.6.3-1
72+
ENV NV_NVTX_VERSION=12.6.77-1
73+
ENV NV_LIBNPP_VERSION=12.3.1.54-1
74+
ENV NV_LIBNPP_PACKAGE=libnpp-12-6-${NV_LIBNPP_VERSION}
75+
ENV NV_LIBCUBLAS_VERSION=12.6.4.1-1
7676
ENV NV_LIBNCCL_PACKAGE_NAME=libnccl
77-
ENV NV_LIBNCCL_PACKAGE_VERSION=2.21.5-1
78-
ENV NV_LIBNCCL_VERSION=2.21.5
79-
ENV NCCL_VERSION=2.21.5
80-
ENV NV_LIBNCCL_PACKAGE=${NV_LIBNCCL_PACKAGE_NAME}-${NV_LIBNCCL_PACKAGE_VERSION}+cuda12.4
77+
ENV NV_LIBNCCL_PACKAGE_VERSION=2.23.4-1
78+
ENV NV_LIBNCCL_VERSION=2.23.4
79+
ENV NCCL_VERSION=2.23.4
80+
ENV NV_LIBNCCL_PACKAGE=${NV_LIBNCCL_PACKAGE_NAME}-${NV_LIBNCCL_PACKAGE_VERSION}+cuda12.6
8181

8282
RUN yum install -y \
83-
cuda-libraries-12-4-${NV_CUDA_LIB_VERSION} \
84-
cuda-nvtx-12-4-${NV_NVTX_VERSION} \
83+
cuda-libraries-12-6-${NV_CUDA_LIB_VERSION} \
84+
cuda-nvtx-12-6-${NV_NVTX_VERSION} \
8585
${NV_LIBNPP_PACKAGE} \
86-
libcublas-12-4-${NV_LIBCUBLAS_VERSION} \
86+
libcublas-12-6-${NV_LIBCUBLAS_VERSION} \
8787
${NV_LIBNCCL_PACKAGE} \
8888
&& yum clean all \
8989
&& rm -rf /var/cache/yum/*
9090

91-
# Set this flag so that libraries can find the location of CUDA
92-
ENV XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda
93-
9491
# Install CUDA devel from:
95-
# https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/12.4.1/ubi9/devel/Dockerfile
96-
ENV NV_NVPROF_VERSION=12.4.127-1
97-
ENV NV_NVPROF_DEV_PACKAGE=cuda-nvprof-12-4-${NV_NVPROF_VERSION}
98-
ENV NV_CUDA_CUDART_DEV_VERSION=12.4.127-1
99-
ENV NV_NVML_DEV_VERSION=12.4.127-1
100-
ENV NV_LIBCUBLAS_DEV_VERSION=12.4.5.8-1
101-
ENV NV_LIBNPP_DEV_VERSION=12.2.5.30-1
102-
ENV NV_LIBNPP_DEV_PACKAGE=libnpp-devel-12-4-${NV_LIBNPP_DEV_VERSION}
92+
# https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/12.6.3/ubi9/devel/Dockerfile
93+
ENV NV_NVPROF_VERSION=12.6.80-1
94+
ENV NV_NVPROF_DEV_PACKAGE=cuda-nvprof-12-6-${NV_NVPROF_VERSION}
95+
ENV NV_CUDA_CUDART_DEV_VERSION=12.6.77-1
96+
ENV NV_NVML_DEV_VERSION=12.6.77-1
97+
ENV NV_LIBCUBLAS_DEV_VERSION=12.6.4.1-1
98+
ENV NV_LIBNPP_DEV_VERSION=12.3.1.54-1
99+
ENV NV_LIBNPP_DEV_PACKAGE=libnpp-devel-12-6-${NV_LIBNPP_DEV_VERSION}
103100
ENV NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-devel
104-
ENV NV_LIBNCCL_DEV_PACKAGE_VERSION=2.21.5-1
105-
ENV NV_LIBNCCL_DEV_PACKAGE=${NV_LIBNCCL_DEV_PACKAGE_NAME}-${NV_LIBNCCL_DEV_PACKAGE_VERSION}+cuda12.4
106-
ENV NV_CUDA_NSIGHT_COMPUTE_VERSION=12.4.1-1
107-
ENV NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE=cuda-nsight-compute-12-4-${NV_CUDA_NSIGHT_COMPUTE_VERSION}
101+
ENV NV_LIBNCCL_DEV_PACKAGE_VERSION=2.23.4-1
102+
ENV NCCL_VERSION=2.23.4
103+
ENV NV_LIBNCCL_DEV_PACKAGE=${NV_LIBNCCL_DEV_PACKAGE_NAME}-${NV_LIBNCCL_DEV_PACKAGE_VERSION}+cuda12.6
104+
ENV NV_CUDA_NSIGHT_COMPUTE_VERSION=12.6.3-1
105+
ENV NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE=cuda-nsight-compute-12-6-${NV_CUDA_NSIGHT_COMPUTE_VERSION}
108106

109107
RUN yum install -y \
110108
make \
111109
findutils \
112-
cuda-command-line-tools-12-4-${NV_CUDA_LIB_VERSION} \
113-
cuda-libraries-devel-12-4-${NV_CUDA_LIB_VERSION} \
114-
cuda-minimal-build-12-4-${NV_CUDA_LIB_VERSION} \
115-
cuda-cudart-devel-12-4-${NV_CUDA_CUDART_DEV_VERSION} \
110+
cuda-command-line-tools-12-6-${NV_CUDA_LIB_VERSION} \
111+
cuda-libraries-devel-12-6-${NV_CUDA_LIB_VERSION} \
112+
cuda-minimal-build-12-6-${NV_CUDA_LIB_VERSION} \
113+
cuda-cudart-devel-12-6-${NV_CUDA_CUDART_DEV_VERSION} \
116114
${NV_NVPROF_DEV_PACKAGE} \
117-
cuda-nvml-devel-12-4-${NV_NVML_DEV_VERSION} \
118-
libcublas-devel-12-4-${NV_LIBCUBLAS_DEV_VERSION} \
115+
cuda-nvml-devel-12-6-${NV_NVML_DEV_VERSION} \
116+
libcublas-devel-12-6-${NV_LIBCUBLAS_DEV_VERSION} \
119117
${NV_LIBNPP_DEV_PACKAGE} \
120118
${NV_LIBNCCL_DEV_PACKAGE} \
121119
${NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE} \
@@ -124,9 +122,9 @@ RUN yum install -y \
124122

125123
ENV LIBRARY_PATH=/usr/local/cuda/lib64/stubs
126124

127-
# Install CUDA devel cudnn8 from:
128-
# hhttps://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/12.4.1/ubi9/devel/cudnn/Dockerfile
129-
ENV NV_CUDNN_VERSION=9.1.0.70-1
125+
# Install CUDA devel cudnn9 from:
126+
# https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/12.6.3/ubi9/devel/cudnn/Dockerfile
127+
ENV NV_CUDNN_VERSION=9.5.1.17-1
130128
ENV NV_CUDNN_PACKAGE=libcudnn9-cuda-12-${NV_CUDNN_VERSION}
131129
ENV NV_CUDNN_PACKAGE_DEV=libcudnn9-devel-cuda-12-${NV_CUDNN_VERSION}
132130

@@ -137,6 +135,9 @@ RUN yum install -y \
137135
${NV_CUDNN_PACKAGE_DEV} \
138136
&& yum clean all \
139137
&& rm -rf /var/cache/yum/*
138+
139+
# Set this flag so that libraries can find the location of CUDA
140+
ENV XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda
140141

141142
# Restore notebook user workspace
142143
USER 1001

jupyter/minimal/ubi9-python-3.11/Pipfile.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

jupyter/minimal/ubi9-python-3.11/requirements.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -595,9 +595,9 @@ markupsafe==3.0.2; python_version >= '3.9' \
595595
matplotlib-inline==0.1.7; python_version >= '3.8' \
596596
--hash=sha256:8423b23ec666be3d16e16b60bdd8ac4e86e840ebd1dd11a30b9f117f2fa0ab90 \
597597
--hash=sha256:df192d39a4ff8f21b1895d72e6a13f5fcc5099f00fa84384e0ea28c2cc0653ca
598-
mistune==3.1.2; python_version >= '3.8' \
599-
--hash=sha256:4b47731332315cdca99e0ded46fc0004001c1299ff773dfb48fbe1fd226de319 \
600-
--hash=sha256:733bf018ba007e8b5f2d3a9eb624034f6ee26c4ea769a98ec533ee111d504dff
598+
mistune==3.1.3; python_version >= '3.8' \
599+
--hash=sha256:1a32314113cff28aa6432e99e522677c8587fd83e3d51c29b82a52409c842bd9 \
600+
--hash=sha256:a7035c21782b2becb6be62f8f25d3df81ccb4d6fa477a6525b15af06539f02a0
601601
multidict==6.2.0; python_version >= '3.9' \
602602
--hash=sha256:0085b0afb2446e57050140240a8595846ed64d1cbd26cef936bfab3192c673b8 \
603603
--hash=sha256:042028348dc5a1f2be6c666437042a98a5d24cee50380f4c0902215e5ec41844 \

0 commit comments

Comments
 (0)