fix(templates): dynamic CUDA repo arch for arm64 support#662
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the NVIDIA driver provisioning template to support ARM64 by making the CUDA APT repository URL architecture dynamic at runtime, and adds unit coverage to prevent regressions in that area.
Changes:
- Replace hardcoded
x86_64in the CUDA repo URL with a runtimeuname -m-based architecture variable. - Map
aarch64tosbsato match NVIDIA’s ARM server CUDA repo naming. - Add/extend unit tests to assert the template no longer hardcodes
x86_64and includes the new arch handling logic.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/provisioner/templates/nv-driver.go | Adds runtime CUDA repo arch detection and uses it in the NVIDIA CUDA keyring download URL. |
| pkg/provisioner/templates/nv-driver_test.go | Adds assertions to ensure the template doesn’t hardcode x86_64 and validates the new arch logic. |
| if [[ "$CUDA_ARCH" == "aarch64" ]]; then | ||
| CUDA_ARCH="sbsa" | ||
| fi |
There was a problem hiding this comment.
The CUDA repo arch mapping only handles aarch64 -> sbsa. In this codebase, other templates treat uname -m as potentially returning arm64 as well (e.g., container-toolkit template maps aarch64|arm64). If uname -m returns arm64 here, the URL will become .../arm64/... which NVIDIA's CUDA repos typically don't provide for server ARM (they use sbsa), causing a 404 and breaking driver install on some arm64 environments. Consider normalizing with a case and mapping both aarch64 and arm64 to sbsa (and update the new unit test accordingly).
| if [[ "$CUDA_ARCH" == "aarch64" ]]; then | |
| CUDA_ARCH="sbsa" | |
| fi | |
| case "$CUDA_ARCH" in | |
| aarch64|arm64) | |
| CUDA_ARCH="sbsa" | |
| ;; | |
| esac |
|
/retest |
1c536d8 to
1a4c795
Compare
Replace hardcoded x86_64 in the CUDA repository URL with runtime architecture detection via uname -m. Maps aarch64 to "sbsa" which is the NVIDIA convention for arm64 server CUDA repositories. Previously, arm64 nodes would fail to download the CUDA keyring package because the x86_64 repo URL does not exist for arm64. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
1a4c795 to
13159cd
Compare
Pull Request Test Coverage Report for Build 21986731653Details
💛 - Coveralls |
Summary
x86_64in CUDA repository URL with runtimeuname -mdetectionaarch64tosbsa(NVIDIA's arm64 server CUDA repo convention)Test plan
TestNVDriverTemplate_CUDARepoArch)go test ./pkg/provisioner/templates/...passesgolangci-lint run ./...passes (0 issues)