-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
I took the example rocky linux 9 nvidia container file, changed it to be a def file and tried to build it.
Example file: https://github.com/warewulf/warewulf-node-images/blob/main/examples/rockylinux-9-nvidia/Containerfile
Def:
Bootstrap: docker
From: ghcr.io/warewulf/warewulf-rockylinux:9
%post
dnf -y install dnf-plugins-core epel-release kernel-headers \
&& dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(arch)/cuda-rhel9.repo \
&& dnf -y module install nvidia-driver:latest-dkms \
&& dnf -y install datacenter-gpu-manager \
&& dnf clean all \
&& for dir in /usr/src/kernels/*; do dkms autoinstall --kernelver $(basename $dir); done \
&& dkms status
apptainer build test.sif test.def
Error while building:
+ dkms autoinstall --kernelver 5.14.0-503.22.1.el9_5.x86_64
Autoinstall of module nvidia/570.86.15 for kernel 5.14.0-503.22.1.el9_5.x86_64 (x86_64)
Sign command: /lib/modules/5.14.0-503.22.1.el9_5.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub
Cleaning build area...(bad exit status: 2)
Failed command:
'make' clean
Building module(s)...(bad exit status: 2)
Failed command:
'make' -j2 modules
Error! Bad return status for module build on kernel: 5.14.0-503.22.1.el9_5.x86_64 (x86_64)
Consult /var/lib/dkms/nvidia/570.86.15/build/make.log for more information.
Autoinstall on 5.14.0-503.22.1.el9_5.x86_64 failed for module(s) nvidia(10).
Error! One or more modules failed to install during autoinstall.
Refer to previous errors for more information.
FATAL: While performing build: while running engine: while running %post section: exit status 11
If I change the nvidia driver from latest back to 565, the apptainer build finishes successfully.
&& dnf -y module install nvidia-driver:latest-dkms \
to
&& dnf -y module install nvidia-driver:565-dkms \
New output:
Complete!
+ dnf clean all
49 files removed
+ for dir in /usr/src/kernels/*
++ basename /usr/src/kernels/5.14.0-503.22.1.el9_5.x86_64
+ dkms autoinstall --kernelver 5.14.0-503.22.1.el9_5.x86_64
+ dkms status
nvidia/565.57.01, 5.14.0-503.22.1.el9_5.x86_64, x86_64: installed
INFO: Creating SIF file...
INFO: Build complete: test.sif
Metadata
Metadata
Assignees
Labels
No labels