-
Notifications
You must be signed in to change notification settings - Fork 36
Description
The driver being used here is pretty old by now
| DEBIAN_FRONTEND=noninteractive sudo apt install nvidia-driver-525 -y |
and despite the expanded CUDA minor version compatibility, some issues arise, e.g. CUDA 12.8/9 produce a PTX that is incompatible with the current driver; concretely, I'm getting
RuntimeError: module load failed with status code 222: CUDA_ERROR_UNSUPPORTED_PTX_VERSION
in conda-forge/tinygrad-feedstock#12.
Furthermore, AFAICT, the build here is actually installing the Debian-based CUDA drivers (named nvidia-drivers-XXX) though we're in an ubuntu image (which has different naming for its native CUDA packaging, e.g. nvidia-graphics-driver-XXX[-server])
Line 7 in b83117b
| IMAGE_NAME := ubuntu-2404-$(IMAGE_TYPE)-$(TIMESTAMP) |
open-gpu-server/vm-images/build-image.sh
Line 11 in b83117b
| export DIB_CLOUD_IMAGES=https://cloud-images.ubuntu.com/noble/20251026/ |
Ideally we could update the driver to use the Ubuntu-native packaging for the CUDA drivers? Ubuntu also has new enough drivers already, whereas Debian is currently stuck on 550, which is apparently not yet compatible with the PTX of CUDA 12.9.
AFAICT, this would work as follows
sudo apt update
sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install nvidia:580