Skip to content

Commit 12cac78

Browse files
authored
Fix Fabric Manager in AWS/GCP/Azure/OCI OS images (#2355)
The `cuda-drivers-535` and `nvidia-fabricmanager-535` packages used to be pinned to a specific version. However, this didn't actually pin the driver version, because `cuda-drivers-535` brings several NVIDIA packages and allows various versions of them. ``` Package: cuda-drivers-535 Version: 535.183.01-1 ... Depends: libnvidia-common-535 (>= 535.183.01), libnvidia-compute-535 (>= 535.183.01), ..., nvidia-driver-535 (>= 535.183.01), ... ``` This could lead to a version mismatch when a newer driver version is installed while Fabric Manager is pinned to an older version. This commit solves the issue by removing the pin so that both the driver and the Fabric Manager have the same latest `535.*` version.
1 parent e05f407 commit 12cac78

File tree

2 files changed

+3
-4
lines changed

2 files changed

+3
-4
lines changed

scripts/packer/provisioners/cuda.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,8 @@ wget https://developer.download.nvidia.com/compute/cuda/repos/$CUDA_DISTRO/$ARCH
1414
sudo dpkg -i cuda-keyring_1.0-1_all.deb
1515
rm cuda-keyring_1.0-1_all.deb
1616

17-
CUDA_BRANCH=$(cut -d '.' -f 1 <<< "$CUDA_DRIVERS_VERSION")
1817
sudo apt-get update
1918
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
20-
cuda-drivers-$CUDA_BRANCH=$CUDA_DRIVERS_VERSION \
21-
nvidia-fabricmanager-$CUDA_BRANCH=$CUDA_DRIVERS_VERSION
19+
cuda-drivers-$CUDA_DRIVERS_VERSION \
20+
nvidia-fabricmanager-$CUDA_DRIVERS_VERSION
2221
sudo systemctl enable nvidia-fabricmanager

scripts/packer/versions.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
22
"docker_version": "27.1.1",
3-
"cuda_drivers_version": "535.183.01-1"
3+
"cuda_drivers_version": "535"
44
}

0 commit comments

Comments
 (0)