Skip to content

fix(templates): dynamic CUDA repo arch for arm64 support#667

Closed
ArangoGutierrez wants to merge 1 commit intoNVIDIA:mainfrom
ArangoGutierrez:fix/arm64-nvidia-driver-cuda-repo
Closed

fix(templates): dynamic CUDA repo arch for arm64 support#667
ArangoGutierrez wants to merge 1 commit intoNVIDIA:mainfrom
ArangoGutierrez:fix/arm64-nvidia-driver-cuda-repo

Conversation

@ArangoGutierrez
Copy link
Collaborator

Summary

  • Replace hardcoded x86_64 in CUDA repository URL with runtime uname -m detection
  • Map aarch64sbsa (NVIDIA's arm64 server CUDA repo convention)

Test plan

  • Unit test verifies template output contains arch detection, not hardcoded x86_64
  • go test ./pkg/provisioner/templates/... -v passes
  • CI validation pending

Replace hardcoded x86_64 in the CUDA repository URL with runtime
architecture detection via uname -m. Maps aarch64 to "sbsa" which
is the NVIDIA convention for arm64 server CUDA repositories.

Previously, arm64 nodes would fail to download the CUDA keyring
package because the x86_64 repo URL does not exist for arm64.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Copilot AI review requested due to automatic review settings February 14, 2026 17:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds dynamic architecture detection for NVIDIA CUDA repository URLs to enable arm64 (aarch64) support. Previously, the CUDA repository URL was hardcoded to use x86_64, preventing the NVIDIA driver template from working on arm64 systems. The change implements runtime architecture detection using uname -m and maps aarch64 to sbsa (NVIDIA's naming convention for arm64 server CUDA repositories).

Changes:

  • Replace hardcoded x86_64 in CUDA repository URL with runtime-detected architecture
  • Add aarch64sbsa mapping to match NVIDIA's arm64 server repository naming
  • Add comprehensive unit test to verify architecture detection logic

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/provisioner/templates/nv-driver.go Implements dynamic CUDA repo architecture detection with aarch64→sbsa mapping
pkg/provisioner/templates/nv-driver_test.go Adds test verifying runtime arch detection and absence of hardcoded x86_64

Comment on lines +118 to +120
if [[ "$CUDA_ARCH" == "aarch64" ]]; then
CUDA_ARCH="sbsa"
fi
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The architecture detection logic only handles x86_64 (implicitly) and aarch64→sbsa mapping, but doesn't provide a fallback or error handling for unsupported architectures. Other templates in the codebase (e.g., container-toolkit.go:243-253) use case statements with explicit error handling for unsupported architectures. Consider adding an else clause to handle unexpected architecture values or at least add a comment explaining that x86_64 is used as-is.

Suggested change
if [[ "$CUDA_ARCH" == "aarch64" ]]; then
CUDA_ARCH="sbsa"
fi
case "$CUDA_ARCH" in
aarch64)
CUDA_ARCH="sbsa"
;;
x86_64)
# use x86_64 as-is
;;
*)
holodeck_log "ERROR" "$COMPONENT" "unsupported architecture for CUDA repository: $CUDA_ARCH"
exit 1
;;
esac

Copilot uses AI. Check for mistakes.
@ArangoGutierrez
Copy link
Collaborator Author

Closing as superseded. The equivalent fixes were already merged into main via PRs #661-664:

Additionally, these fixes address downstream provisioning issues but do not resolve the actual EC2 RunInstances failure (Unsupported: The requested configuration is currently not supported) reported in https://github.com/NVIDIA/gpu-driver-container/actions/runs/22012665274/job/63611032634. The root cause is missing architecture inference from instance type — when image.architecture is unset, holodeck defaults to x86_64 regardless of the instance type. A new PR will address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants