-
Notifications
You must be signed in to change notification settings - Fork 69
Add support for rhel10.0 and rhel10.1 #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9a3e0d9 to
3820eed
Compare
38cebfe to
56f469c
Compare
shivakunv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help needed here.
Please check which base image and driver version to use for RHEL10 support .
@rajathagasthya @tariq1890 @cdesiniotis
70e7101 to
6e35a8e
Compare
|
Please find addtional information regarding extrapackage and driver version availability:- used gunzip instead of |
5de7278 to
b460e52
Compare
b460e52 to
e5ecf60
Compare
d38fdc5 to
386913d
Compare
64c1151 to
be8a570
Compare
shivakunv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added comments regarding some minor changes that differ from rhel9.
rhel10/install.sh
Outdated
|
|
||
| extra_pkgs_install() { | ||
| if [ "$DRIVER_TYPE" != "vgpu" ]; then | ||
| if dnf module list nvidia-driver:${DRIVER_BRANCH}-dkms 2>/dev/null | grep -q "nvidia-driver"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a condition to check for the availability of dkms, as it is currently not available for 580 driver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer a check like this if [ "$DRIVER_BRANCH" -ge "580" ]; then for these conditional blocks. Can you find out which driver branches don't have this particular dnf module ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed.
commented the code:
dnf module enable -y nvidia-driver:${DRIVER_BRANCH}-dkms
RHEL 10 is NOT listed in the module stream instructions
[RHEL 10 is NOT listed in the module stream instructions](https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/red-hat-enterprise-linux.html#dnf-module-enablement)
In previous RHEL major versions, some Application Streams were available as modules as an extension to the RPM format. In RHEL 10, Red Hat does not intend to provide any Application Streams that use modularity as the packaging technology and, therefore, no modular content is being distributed with RHEL 10.
redhat docs link
rhel10/nvidia-driver
Outdated
| echo "Installing Linux kernel headers..." | ||
| # Check if kernel headers are already available (mounted from host) | ||
| if [ -d "/usr/src/kernels/${KERNEL_VERSION}" ]; then | ||
| echo "Kernel headers for ${KERNEL_VERSION} already present at /usr/src/kernels/${KERNEL_VERSION}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a condition for installing the kernel headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change strictly needed for RHEL 10? If this is a cleanup or enhancement, let's do it in a follow-up instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enhancement, removed
rhel10/nvidia-driver
Outdated
| ln -s /usr/src/kernels/${KERNEL_VERSION} /lib/modules/${KERNEL_VERSION}/build | ||
|
|
||
| echo "Installing Linux kernel module files..." | ||
| if ! dnf -q -y --releasever=${DNF_RELEASEVER} install kernel-${KERNEL_VERSION} > /dev/null 2>&1; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a condition for installing the kernel headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change strictly needed for RHEL 10? If this is a cleanup or enhancement, let's do it in a follow-up instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed.
Error daemonset logs:-
ERROR: Failure creating directory '/lib/firmware/nvidia' : (No such file or directory)
ERROR: Failure creating directory '/lib/firmware/nvidia' : (No such file or directory)
Error: No matching repo to modify: rhel-10-for-x86_64-baseos-eus-rpms.
Error unpacking rpm package amd-ucode-firmware-20250314-15.el10.noarch
Error unpacking rpm package atheros-firmware-20250314-15.el10.noarch
Error unpacking rpm package brcmfmac-firmware-20250314-15.el10.noarch
Error unpacking rpm package cirrus-audio-firmware-20250314-15.el10.noarch
Error unpacking rpm package intel-audio-firmware-20250314-15.el10.noarch
Error unpacking rpm package mt7xxx-firmware-20250314-15.el10.noarch
Error unpacking rpm package nxpwireless-firmware-20250314-15.el10.noarch
Error unpacking rpm package realtek-firmware-20250314-15.el10.noarch
Error unpacking rpm package tiwilink-firmware-20250314-15.el10.noarch
Error unpacking rpm package amd-gpu-firmware-20250314-15.el10.noarch
Error unpacking rpm package intel-gpu-firmware-20250314-15.el10.noarch
Error unpacking rpm package nvidia-gpu-firmware-20250314-15.el10.noarch
Error unpacking rpm package linux-firmware-20250314-15.el10.noarch
Error: Transaction failed
In RHEL10, the kernel package has stronger dependencies on firmware packages.
--setopt=install_weak_deps=False: Tells dnf to skip Recommends dependencies . Weak dependencies are packages that are nice to have.
rhel10/nvidia-driver
Outdated
| echo "kernel requires gcc version: 'gcc-${gcc_version}', current gcc version is '${current_gcc}'" | ||
|
|
||
| if ! [[ "${current_gcc}" =~ "gcc-${gcc_version}"-.* ]]; then | ||
| echo "WARNING: GCC version mismatch detected, but attempting to continue..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcc mismatch handled gracefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change strictly needed for RHEL 10? If this is a cleanup or enhancement, let's do it in a follow-up instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the mandatory change and removed the enhancement.
Error:
++ rpm -qa gcc
+ local current_gcc=gcc-14.3.1-2.1.el10.x86_64
+ echo 'kernel requires gcc version: '\''gcc-14.2.1
14.2.1'\'', current gcc version is '\''gcc-14.3.1-2.1.el10.x86_64'\'''
+ [[ gcc-14.3.1-2.1.el10.x86_64 =~ gcc-14\.2\.1
14\.2\.1-.* ]]
kernel requires gcc version: 'gcc-14.2.1
14.2.1', current gcc version is 'gcc-14.3.1-2.1.el10.x86_64'
+ dnf install -q -y --releasever=10.0 'gcc-14.2.1
14.2.1'
Error: Unable to find a match: gcc-14.2.1
Issue:
rgex grep -Eo "([0-9\.]+)" is matching TWO occurrences of version numbers
local gcc_version=$(cat /lib/modules/${KERNEL_VERSION}/proc/version | grep -Eo "gcc \(GCC\) ([0-9\.]+)" | grep -Eo "([0-9\.]+)")
resulting in a string with a newline.
solution:
Added | head -1 to take only the first match, ensuring gcc_version contains only 14.2.1.
fb9ac92 to
8baa52b
Compare
rhel10/install.sh
Outdated
|
|
||
| extra_pkgs_install() { | ||
| if [ "$DRIVER_TYPE" != "vgpu" ]; then | ||
| if dnf module list nvidia-driver:${DRIVER_BRANCH}-dkms 2>/dev/null | grep -q "nvidia-driver"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer a check like this if [ "$DRIVER_BRANCH" -ge "580" ]; then for these conditional blocks. Can you find out which driver branches don't have this particular dnf module ?
rhel10/install.sh
Outdated
| # Download unzboot as kernel images are compressed in the zboot format on RHEL 9 arm64 | ||
| # unzboot is only available on the EPEL RPM repo | ||
| rpm --import https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9 | ||
| dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be using EPEL 10, not 9. If dnf install -y unzboot does work in EPEL 10, then there is probably another command to install unzboot in RHEL 10 + EPEL 10 envs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using references like this will help you find out the correct package repos and installation commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you suggested, I checked the link and found that unzboot is available only in EPEL 10.2.
unzboot-0.1~git.20250530.3ccaa1a-2.el10_2 in Fedora EPEL 10.2
I have removed the RHEL9 unzboot installation and implemented a process to download and build the source and install it
806548b to
eae2faf
Compare
Signed-off-by: Shiva Kumar (SW-CLOUD) <[email protected]>
eae2faf to
f09dcae
Compare
No description provided.