CI: Switch from PyTorch to cuda-dl-base images for unification (#924)

Alexey-Rivkin · michal-shalev · tsg- · web-flow · commit e4ba569c7479 · 2025-10-30T11:09:41.000+02:00
* CI: Switch from PyTorch to cuda-dl-base for unification Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Handle Meson update in build.sh Meson update requires Python, which is installed in build.sh Previous base image had Python pre-installed, but cuda-dl-base has not Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Limit ninja parallelism to fix OOM in Ubuntu22 build Added -j${NPROC} to ninja commands to prevent out-of-memory compiler kills. Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Align Python version with other install procedures Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Switch to cuda-dl-base images with pip upgrade for Ubuntu 22.04 cuda-dl-base Ubuntu 22.04 ships pip 22.0.2 without --break-system-packages support. Upgrade pip to 24.x to match PyTorch image behavior. Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Add ~/.local/bin to PATH for user pip installs Fixes "pytest: command not found" when pip defaults to user installation. Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Update to CUDA12.9 Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Use latest cuda-dl-base image for CUDA12.8 Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Set CUDA_HOME in the build script Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Fix the Permission denied err on DOCA download Use /tmp to avoid Permission denied in non-writable directories Also add cleanup for the DOCA install package Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Make /workspace writable to resolve fs access failures Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Use cuda-dl-base 25.06 to match rock32 node driver version The images comes with CUDA 12.9 - verified with Ovidiu it is supported. Resolves error 803 (cudaErrorSystemDriverMismatch) by using cuda-dl-base:25.06 which includes compat driver 575.57.08, matching the H100 nodes' driver version. Previous 25.03 image had driver 570.124.06 causing version mismatch. Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Control ninja parallelism in test_python and increase timeout cuda-dl-base is missing large Pyuthon packages that comes pre-instelled with Pytorch images. Install caused frequent OOM and/or timeout on Ubuntu22 Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * UCX/BACKEND: Add worker_id selection support (#938) Signed-off-by: Michal Shalev <mshalev@nvidia.com> * libfabric: Use desc-specific target offset (#883) This fixes a bug in multi-descriptor transfers where descriptors point to different offsets within the same registered memory region. Without this fix, RDMA reads always target offset 0. Should extract each descriptor's specific target address instead. Also impacted: Block-based transfers (Iteration N would read blocks from iteration 0, etc), Partial buffer updates, etc. Signed-off-by: Tushar Gohad <tushar.gohad@intel.com> * Parallelism Control for pip install Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Reorder Python and CPP test stages Python stage has higher fail probability, so better fall fast. Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Fix log message when env var not defined (#914) Signed-off-by: Ovidiu Mara <ovidium@nvidia.com> Co-authored-by: Mikhail Brinskiy <brminich@users.noreply.github.com> * Minor cleanup Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Reorder Python and CPP test stages Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Unify to the latest Docker tag Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Revert the timeout extension The expectation was to longer build times due to switching to a base image with no Python. In practice, no test is running more then 10 minutes so old 30 minutes timeout is still valid. Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Move /workspace chmod to the Dockerfile That chmod is only needed for CI use cases. Moving it to the CI-specific Dockerfiles so it would not affect other cases. Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Set NPROC in common.sh and reuse Reduce NPROC set occurences with the default fallback Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Improve NPROC and CUDA_HOME handling in common.sh - Move CUDA_HOME setup to common.sh before UCX build check - Calculate NPROC based on container memory limits (1 proc/GB, max 16) - Detect containers via /.dockerenv, /run/.containerenv, or KUBERNETES_SERVICE_HOST Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Remove hardcoded NPROC from pipelines NPROC is now set dynamically by common.sh instead Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> * Limit CPU parallelism on bare metal nodes Docker containers see all host CPUs, need to limit on BM Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> --------- Signed-off-by: Alexey Rivkin <arivkin@nvidia.com> Signed-off-by: Michal Shalev <mshalev@nvidia.com> Signed-off-by: Tushar Gohad <tushar.gohad@intel.com> Signed-off-by: Ovidiu Mara <ovidium@nvidia.com> Signed-off-by: ovidiusm <ovidium@nvidia.com> Co-authored-by: Michal Shalev <mshalev@nvidia.com> Co-authored-by: Tushar Gohad <tusharsg@gmail.com> Co-authored-by: ovidiusm <ovidium@nvidia.com> Co-authored-by: Mikhail Brinskiy <brminich@users.noreply.github.com>
diff --git a/.ci/dockerfiles/Dockerfile.gpu_test b/.ci/dockerfiles/Dockerfile.gpu_test
@@ -3,7 +3,7 @@
 # This Dockerfile creates a GPU-enabled test environment for NIXL (NVIDIA I/O eXchange Layer)
 # development and testing. It provides a containerized environment with:
 #
-# - NVIDIA PyTorch base image with CUDA support
+# - NVIDIA cuda-dl-base image with CUDA support
 # - Non-root user setup for security
 # - Sudo access for package installation and system configuration
 # - Optimized for CI/CD pipeline testing
@@ -13,7 +13,7 @@
 #   docker run --gpus all --privileged -it nixl-gpu-test
 #
 # Build arguments:
-#   BASE_IMAGE: Base NVIDIA PyTorch image (default: nvcr.io/nvidia/pytorch:25.02-py3)
+#   BASE_IMAGE: Base NVIDIA cuda-dl-base image (default: nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04)
 #   _UID: User ID for the non-root user (default: 148069)
 #   _GID: Group ID for the user (default: 30)
 #   _LOGIN: Username (default: svc-nixl)
@@ -22,7 +22,7 @@
 #   WORKSPACE: Workspace directory path
 #
 
-ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:25.02-py3
+ARG BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04
 
 FROM ${BASE_IMAGE}
 
@@ -41,7 +41,7 @@ LABEL version="1.0"
 
 # Update package list and install required packages in one layer
 RUN apt-get update && \
-    apt-get install -y sudo \
+    apt-get install -y sudo python3 python3-pip \
     && apt-get clean \
     && rm -rf /var/lib/apt/lists/*
 
@@ -59,6 +59,9 @@ RUN mkdir -p /etc/sudoers.d && \
     chmod 440 /etc/sudoers.d/${_LOGIN} && \
     chown root:root /etc/sudoers.d/${_LOGIN}
 
+# Create and set permissions for workspace directory
+RUN mkdir -p ${WORKSPACE} && chmod 777 ${WORKSPACE}
+
 # Copy workspace into container (workaround for files disappearing from workspace)
 COPY --chown="${_UID}":"${_GID}" . ${WORKSPACE}
 
diff --git a/.ci/docs/setup_nvidia_gpu_with_rdma_support_on_ubuntu.md b/.ci/docs/setup_nvidia_gpu_with_rdma_support_on_ubuntu.md
@@ -137,7 +137,7 @@ sudo nvidia-ctk runtime configure --runtime=docker
 sudo systemctl restart docker
 ```
 
-Verify GPU access in containers using `docker run --gpus all nvcr.io/nvidia/pytorch:25.02-py3 nvidia-smi`[^1_3].
+Verify GPU access in containers using `docker run --gpus all nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04 nvidia-smi`[^1_3].
 
 ### 9. **Validation and Troubleshooting**
 
diff --git a/.ci/jenkins/lib/build-container-matrix.yaml b/.ci/jenkins/lib/build-container-matrix.yaml
@@ -31,7 +31,6 @@ env:
   REGISTRY_REPO: "sw-nbu-swx-nixl-docker-local/verification"
   LOCAL_TAG_BASE: "nixl-ci:build-"
   MAIL_FROM: "jenkins@nvidia.com"
-  NPROC: "16"
 
 taskName: "${BUILD_TARGET}/${arch}/${axis_index}"
 
diff --git a/.ci/jenkins/lib/build-matrix.yaml b/.ci/jenkins/lib/build-matrix.yaml
@@ -6,7 +6,7 @@
 # Key Components:
 # - Job Configuration: Defines timeout, failure behavior, and Kubernetes resources
 # - Docker Images: Specifies the container images used for different build stages
-#   - PyTorch images (24.10 and 25.02) for building and testing
+#   - cuda-dl-base images (25.06 for Ubuntu 24.04, 24.10 for Ubuntu 22.04) for building and testing
 #   - Podman image for container builds
 # - Matrix Axes: Defines build variations (currently x86_64 architecture)
 # - Build Steps: Sequential steps for building, testing, and container creation
@@ -34,8 +34,8 @@ kubernetes:
   requests: "{memory: 8Gi, cpu: 8000m}"
 
 runs_on_dockers:
-  - { name: "ubuntu24.04-pytorch", url: "nvcr.io/nvidia/pytorch:25.02-py3" }
-  - { name: "ubuntu22.04-pytorch", url: "nvcr.io/nvidia/pytorch:24.10-py3" }
+  - { name: "ubuntu24.04-cuda-dl-base", url: "nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04" }
+  - { name: "ubuntu22.04-cuda-dl-base", url: "nvcr.io/nvidia/cuda-dl-base:24.10-cuda12.6-devel-ubuntu22.04" }
   - { name: "podman-v5.0.2", url: "quay.io/podman/stable:v5.0.2", category: 'tool', privileged: true }
 
 matrix:
@@ -47,17 +47,12 @@ matrix:
 env:
   NIXL_INSTALL_DIR: /opt/nixl
   TEST_TIMEOUT: 30
-  NPROC: "16"
   UCX_TLS: "^shm"
 
 steps:
   - name: Build
     parallel: false
     run: |
-      if [[ "${name}" == *"ubuntu22.04"* ]]; then
-        # distro's meson version is too old project requires >= 0.64.0
-        pip3 install meson
-      fi
       .gitlab/build.sh ${NIXL_INSTALL_DIR}
 
   - name: Test CPP
diff --git a/.ci/jenkins/lib/test-matrix.yaml b/.ci/jenkins/lib/test-matrix.yaml
@@ -30,7 +30,7 @@ runs_on_agents:
 matrix:
   axes:
     image:
-      - nvcr.io/nvidia/pytorch:25.02-py3
+      - nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04
     arch:
       - x86_64
     ucx_version:
@@ -42,9 +42,10 @@ taskName: "${name}/${arch}/ucx-${ucx_version}/${axis_index}"
 env:
   CONTAINER_WORKSPACE: /workspace
   INSTALL_DIR: ${CONTAINER_WORKSPACE}/nixl_install
-  NPROC: "16"
   # Manual timeout - ci-demo doesn't handle docker exec
   TEST_TIMEOUT: 30
+  # NPROC for bare-metal: containers see all host CPUs, need to limit parallelism
+  NPROC: 16
 
 steps:
   - name: Get Environment Info
diff --git a/.ci/jenkins/pipeline/proj-jjb.yaml b/.ci/jenkins/pipeline/proj-jjb.yaml
@@ -280,7 +280,7 @@
           description: "Base Docker image for the container build"
       - string:
           name: "BASE_IMAGE_TAG"
-          default: "25.03-cuda12.8-devel-ubuntu24.04"
+          default: "25.06-cuda12.9-devel-ubuntu24.04"
           description: "Tag for the base Docker image"
       - string:
           name: "TAG_SUFFIX"
@@ -294,7 +294,7 @@
           description: >
             Update the latest tag for this architecture.<br/>
             When enabled, also creates: <code>&lt;base-image-tag&gt;-&lt;arch&gt;-latest</code><br/>
-            Example: <code>25.03-cuda12.8-devel-ubuntu24.04-aarch64-latest</code><br/>
+            Example: <code>25.06-cuda12.9-devel-ubuntu24.04-aarch64-latest</code><br/>
       - string:
           name: "MAIL_TO"
           default: "25f58ae0.NVIDIA.onmicrosoft.com@amer.teams.ms"
diff --git a/.ci/scripts/common.sh b/.ci/scripts/common.sh
@@ -78,6 +78,11 @@ max_gtest_port=$((tcp_port_max + gtest_offset))
 # Check if a GPU is present
 nvidia-smi -L | grep -q '^GPU' && HAS_GPU=true || HAS_GPU=false
 
+# Ensure CUDA_HOME is set if CUDA is installed (cuda-dl-base images don't set it by default)
+if [ -d "/usr/local/cuda" ] && [ -z "$CUDA_HOME" ]; then
+    export CUDA_HOME=/usr/local/cuda
+fi
+
 if $HAS_GPU && test -d "$CUDA_HOME"
 then
     UCX_CUDA_BUILD_ARGS="--with-cuda=${CUDA_HOME}"
@@ -89,3 +94,24 @@ fi
 
 # Default to false, unless TEST_LIBFABRIC is set. AWS EFA tests must set it to true.
 export TEST_LIBFABRIC=${TEST_LIBFABRIC:-false}
+
+# Set default parallelism for make/ninja (can be overridden by NPROC env var)
+if [ -z "$NPROC" ]; then
+    # In containers, calculate based on memory limits to avoid OOM
+    if [[ -f /.dockerenv  ||  -f /run/.containerenv  ||  -n "${KUBERNETES_SERVICE_HOST}" ]]; then
+        if [ -f /sys/fs/cgroup/memory/memory.limit_in_bytes ]; then
+            limit=$(cat /sys/fs/cgroup/memory/memory.limit_in_bytes)
+        elif [ -f /sys/fs/cgroup/memory.max ]; then
+            limit=$(cat /sys/fs/cgroup/memory.max)
+        else
+            limit=$((4 * 1024 * 1024 * 1024))
+        fi
+        # Use 1 process per GB of memory, max 16
+        nproc=$((limit / (1024 * 1024 * 1024)))
+        nproc=$((nproc > 16 ? 16 : nproc))
+        nproc=$((nproc < 1 ? 1 : nproc))
+    else
+        nproc=$(nproc --all)
+    fi
+    export NPROC=$nproc
+fi
diff --git a/.gitlab/build.sh b/.gitlab/build.sh
@@ -57,7 +57,9 @@ ARCH=$(uname -m)
 $SUDO rm -rf /usr/lib/cmake/grpc /usr/lib/cmake/protobuf
 
 $SUDO apt-get -qq update
-$SUDO apt-get -qq install -y curl \
+$SUDO apt-get -qq install -y python3-dev \
+                             python3-pip \
+                             curl \
                              wget \
                              libnuma-dev \
                              numactl \
@@ -101,6 +103,17 @@ $SUDO apt-get -qq install -y curl \
                              libhwloc-dev \
                              libcurl4-openssl-dev zlib1g-dev # aws-sdk-cpp dependencies
 
+# Ubuntu 22.04 specific setup
+if grep -q "Ubuntu 22.04" /etc/os-release 2>/dev/null; then
+    # Upgrade pip for '--break-system-packages' support
+    $SUDO pip3 install --upgrade pip
+
+    # Upgrade meson (distro version 0.61.2 is too old, project requires >= 0.64.0)
+    $SUDO pip3 install --upgrade meson
+    # Ensure pip3's meson takes precedence over apt's version
+    export PATH="$HOME/.local/bin:/usr/local/bin:$PATH"
+fi
+
 # Add DOCA repository and install packages
 ARCH_SUFFIX=$(if [ "${ARCH}" = "aarch64" ]; then echo "arm64"; else echo "amd64"; fi)
 MELLANOX_OS="$(. /etc/lsb-release; echo ${DISTRIB_ID}${DISTRIB_RELEASE} | tr A-Z a-z | tr -d .)"
@@ -172,7 +185,7 @@ rm "libfabric-${LIBFABRIC_VERSION#v}.tar.bz2"
   cd etcd-cpp-apiv3 && \
   mkdir build && cd build && \
   cmake .. && \
-  make -j"${NPROC:-$(nproc)}" && \
+  make -j"$NPROC" && \
   $SUDO make install && \
   $SUDO ldconfig \
 )
@@ -183,7 +196,7 @@ rm "libfabric-${LIBFABRIC_VERSION#v}.tar.bz2"
   mkdir aws_sdk_build && \
   cd aws_sdk_build && \
   cmake ../aws-sdk-cpp/ -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3" -DENABLE_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/usr/local && \
-  make -j"${NPROC:-$(nproc)}" && \
+  make -j"$NPROC" && \
   $SUDO make install
 )
 
@@ -215,12 +228,12 @@ export UCX_TLS=^cuda_ipc
 
 # shellcheck disable=SC2086
 meson setup nixl_build --prefix=${INSTALL_DIR} -Ducx_path=${UCX_INSTALL_DIR} -Dbuild_docs=true -Drust=false ${EXTRA_BUILD_ARGS} -Dlibfabric_path="${LIBFABRIC_INSTALL_DIR}"
-ninja -C nixl_build && ninja -C nixl_build install
+ninja -j"$NPROC" -C nixl_build && ninja -j"$NPROC" -C nixl_build install
 mkdir -p dist && cp nixl_build/src/bindings/python/nixl-meta/nixl-*.whl dist/
 
 # TODO(kapila): Copy the nixl.pc file to the install directory if needed.
 # cp ${BUILD_DIR}/nixl.pc ${INSTALL_DIR}/lib/pkgconfig/nixl.pc
 
 cd benchmark/nixlbench
 meson setup nixlbench_build -Dnixl_path=${INSTALL_DIR} -Dprefix=${INSTALL_DIR}
-ninja -C nixlbench_build && ninja -C nixlbench_build install
+ninja -j"$NPROC" -C nixlbench_build && ninja -j"$NPROC" -C nixlbench_build install
diff --git a/.gitlab/test_python.sh b/.gitlab/test_python.sh
@@ -40,12 +40,16 @@ export NIXL_PREFIX=${INSTALL_DIR}
 # Raise exceptions for logging errors
 export NIXL_DEBUG_LOGGING=yes
 
-pip3 install --break-system-packages .
+# Control ninja parallelism during pip build to prevent OOM (NPROC from common.sh)
+pip3 install --break-system-packages --config-settings=compile-args="-j${NPROC}" .
 pip3 install --break-system-packages dist/nixl-*none-any.whl
 pip3 install --break-system-packages pytest
 pip3 install --break-system-packages pytest-timeout
 pip3 install --break-system-packages zmq
 
+# Add user pip packages to PATH
+export PATH="$HOME/.local/bin:$PATH"
+
 echo "==== Running ETCD server ===="
 etcd_port=$(get_next_tcp_port)
 etcd_peer_port=$(get_next_tcp_port)
diff --git a/benchmark/nixlbench/README.md b/benchmark/nixlbench/README.md
@@ -172,7 +172,7 @@ cd nixl/benchmark/nixlbench/contrib
 | `--ucx <path>` | Path to custom UCX source (optional) | Uses base image UCX |
 | `--build-type <type>` | Build type: `debug` or `release` | `release` |
 | `--base-image <image>` | Base Docker image | `nvcr.io/nvidia/cuda-dl-base` |
-| `--base-image-tag <tag>` | Base image tag | `25.03-cuda12.8-devel-ubuntu24.04` |
+| `--base-image-tag <tag>` | Base image tag | `25.06-cuda12.9-devel-ubuntu24.04` |
 | `--arch <arch>` | Target architecture: `x86_64` or `aarch64` | Auto-detected |
 | `--python-versions <versions>` | Python versions (comma-separated) | `3.12` |
 | `--tag <tag>` | Custom Docker image tag | Auto-generated |
diff --git a/benchmark/nixlbench/contrib/Dockerfile b/benchmark/nixlbench/contrib/Dockerfile
@@ -14,7 +14,7 @@
 # limitations under the License.
 
 ARG BASE_IMAGE="nvcr.io/nvidia/cuda-dl-base"
-ARG BASE_IMAGE_TAG="25.03-cuda12.8-devel-ubuntu24.04"
+ARG BASE_IMAGE_TAG="25.06-cuda12.9-devel-ubuntu24.04"
 
 # UCX argument is either "upstream" (default installed in base image) or "custom" (build from source)
 ARG UCX="upstream"
diff --git a/benchmark/nixlbench/contrib/build.sh b/benchmark/nixlbench/contrib/build.sh
@@ -35,7 +35,7 @@ if [ -z ${latest_tag} ]; then
 fi
 
 BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base
-BASE_IMAGE_TAG=25.03-cuda12.8-devel-ubuntu24.04
+BASE_IMAGE_TAG=25.06-cuda12.9-devel-ubuntu24.04
 ARCH=$(uname -m)
 [ "$ARCH" = "arm64" ] && ARCH="aarch64"
 WHL_BASE=manylinux_2_39
diff --git a/contrib/Dockerfile b/contrib/Dockerfile
@@ -14,7 +14,7 @@
 # limitations under the License.
 
 ARG BASE_IMAGE="nvcr.io/nvidia/cuda-dl-base"
-ARG BASE_IMAGE_TAG="25.03-cuda12.8-devel-ubuntu24.04"
+ARG BASE_IMAGE_TAG="25.06-cuda12.9-devel-ubuntu24.04"
 ARG OS
 
 FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG}
diff --git a/contrib/aws-efa/README.md b/contrib/aws-efa/README.md
@@ -89,7 +89,7 @@ The AWS test script:
 
 ## Container Image
 
-The script uses the container image: `nvcr.io/nvidia/pytorch:25.02-py3`
+The script uses the container image: `nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04`
 You can override this by setting the `CONTAINER_IMAGE` environment variable:
 
 ```bash
diff --git a/contrib/aws-efa/aws_job_def.json b/contrib/aws-efa/aws_job_def.json
@@ -15,7 +15,7 @@
             "imagePullSecrets": [],
             "containers": [
                 {
-                    "image": "nvcr.io/nvidia/pytorch:25.02-py3",
+                    "image": "nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04",
                     "command": [
                         "/bin/bash",
                         "-c",
diff --git a/contrib/aws-efa/aws_test.sh b/contrib/aws-efa/aws_test.sh
@@ -30,7 +30,7 @@ usage() {
     echo "  GITHUB_REPOSITORY - GitHub repository (e.g., \"ai-dynamo/nixl\")"
     echo ""
     echo "Optional environment variables:"
-    echo "  CONTAINER_IMAGE   - Container image to use (default: nvcr.io/nvidia/pytorch:25.02-py3)"
+    echo "  CONTAINER_IMAGE   - Container image to use (default: nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04)"
     echo "  TEST_TIMEOUT      - Timeout for test execution in minutes"
     exit 1
 }
@@ -47,7 +47,7 @@ if [ -z "$GITHUB_REF" ] || [ -z "$GITHUB_SERVER_URL" ] || [ -z "$GITHUB_REPOSITO
 fi
 
 test_cmd="$1"
-export CONTAINER_IMAGE=${CONTAINER_IMAGE:-"nvcr.io/nvidia/pytorch:25.02-py3"}
+export CONTAINER_IMAGE=${CONTAINER_IMAGE:-"nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04"}
 
 # Set Git checkout command based on GITHUB_REF
 case "$GITHUB_REF" in
diff --git a/contrib/build-container.sh b/contrib/build-container.sh
@@ -29,7 +29,7 @@ fi
 VERSION=v$latest_tag.dev.$commit_id
 
 BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base
-BASE_IMAGE_TAG=25.03-cuda12.8-devel-ubuntu24.04
+BASE_IMAGE_TAG=25.06-cuda12.9-devel-ubuntu24.04
 ARCH=$(uname -m)
 [ "$ARCH" = "arm64" ] && ARCH="aarch64"
 WHL_BASE=manylinux_2_39