Skip to content

Conversation

@shivakunv
Copy link
Contributor

No description provided.

@shivakunv shivakunv self-assigned this Dec 2, 2025
@shivakunv shivakunv force-pushed the supportrhel10 branch 2 times, most recently from 9a3e0d9 to 3820eed Compare December 12, 2025 12:25
@shivakunv shivakunv force-pushed the supportrhel10 branch 6 times, most recently from 38cebfe to 56f469c Compare December 15, 2025 07:38
Copy link
Contributor Author

@shivakunv shivakunv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help needed here.
Please check which base image and driver version to use for RHEL10 support .
@rajathagasthya @tariq1890 @cdesiniotis

@shivakunv shivakunv force-pushed the supportrhel10 branch 4 times, most recently from 70e7101 to 6e35a8e Compare December 16, 2025 16:09
@shivakunv
Copy link
Contributor Author

Please find addtional information regarding extrapackage and driver version availability:-

dnf list available nvidia-driver-580*
nvidia-driver.aarch64                                                                                                                                                       3:580.105.08-1.el10                                                                                                                                                       cuda              
nvidia-driver.x86_64                                                                                                                                                        3:580.105.08-1.el10                            
dnf list available nvidia-driver-590*
nvidia-driver-assistant.noarch                                                                                                                                                   0.23.44.01-1                                                                                                                                                         cuda              
nvidia-driver-assistant.noarch                                                                                                                                                   0.23.44.01-1                                                                                                                                                         cuda-rhel10-x86_64
nvidia-driver-cuda.aarch64                                                                                                                                                       3:590.44.01-1.el10                                                                                                                                                   cuda              
nvidia-driver-cuda.x86_64                                                                                                                                                        3:590.44.01-1.el10                                                                                                                                                   cuda-rhel10-x86_64
nvidia-driver-cuda-libs.aarch64                                                                                                                                                  3:590.44.01-1.el10                                                                                                                                                   cuda              
nvidia-driver-cuda-libs.x86_64                                                                                                                                                   3:590.44.01-1.el10                                                                                                                                                   cuda-rhel10-x86_64
nvidia-driver-libs.aarch64                                                                                                                                                       3:590.44.01-1.el10                                                                                                                                                   cuda              
nvidia-driver-libs.x86_64                                                                                                                                                        3:590.44.01-1.el10                                                                                                                                                   cuda-rhel10-x86_64
 dnf list available nvidia-fabric*
 Available Packages
nvidia-fabric-manager-devel.aarch64                                                                                                                                                 590.44.01-1.el10                                                                                                                                                  cuda              
nvidia-fabric-manager-devel.x86_64                                                                                                                                                  590.44.01-1.el10                                                                                                                                                  cuda-rhel10-x86_64
nvidia-fabricmanager.aarch64                                                                                                                                                        590.44.01-1.el10                                                                                                                                                  cuda              
nvidia-fabricmanager.x86_64                                                                                                                                                         590.44.01-1.el10                                                                                                                                                  cuda-rhel10-x86_64
nvidia-fabricmanager-devel.aarch64                                                                                                                                                  580.65.06-1                                                                                                                                                       cuda              
nvidia-fabricmanager-devel.x86_64                                                                                                                                                   580.65.06-1                                                                                                                                                       cuda-rhel10-x86_64
dnf list available libnvidia-nscq*
Available Packages
libnvidia-nscq.aarch64                                                                                                                                                        590.44.01-1.el10                                                                                                                                                        cuda              
libnvidia-nscq.x86_64                                                                                                                                                         590.44.01-1.el10                                                                                                                                                        cuda-rhel10-x86_64
dnf list available libnvsdm*
Available Packages
libnvsdm.x86_64                                                                                                                                                              590.44.01-1.el10                                                                                                                                                         cuda-rhel10-x86_64
libnvsdm-devel.x86_64                                                                                                                                                        590.44.01-1.el10                                                                                                                                                         cuda-rhel10-x86_64
dnf list available infiniband-diags*
Available Packages
infiniband-diags.aarch64                                                                                                                                                  57.0-2.el10                                                                                                                                                  ubi-10-for-aarch64-appstream-rpms
dnf list available nvidia-imex* 
Available Packages
nvidia-imex.aarch64                                                                                                                                                         590.44.01-1.el10                                                                                                                                                          cuda              
nvidia-imex.x86_64                                                                                                                                                          590.44.01-1.el10                

used gunzip instead of unzboot for as it is not available for rhel10

@shivakunv shivakunv force-pushed the supportrhel10 branch 4 times, most recently from 5de7278 to b460e52 Compare January 14, 2026 12:47
@shivakunv
Copy link
Contributor Author

Copy link
Contributor Author

@shivakunv shivakunv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comments regarding some minor changes that differ from rhel9.


extra_pkgs_install() {
if [ "$DRIVER_TYPE" != "vgpu" ]; then
if dnf module list nvidia-driver:${DRIVER_BRANCH}-dkms 2>/dev/null | grep -q "nvidia-driver"; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a condition to check for the availability of dkms, as it is currently not available for 580 driver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a check like this if [ "$DRIVER_BRANCH" -ge "580" ]; then for these conditional blocks. Can you find out which driver branches don't have this particular dnf module ?

Copy link
Contributor Author

@shivakunv shivakunv Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed.
commented the code:
dnf module enable -y nvidia-driver:${DRIVER_BRANCH}-dkms
RHEL 10 is NOT listed in the module stream instructions
[RHEL 10 is NOT listed in the module stream instructions](https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/red-hat-enterprise-linux.html#dnf-module-enablement)
In previous RHEL major versions, some Application Streams were available as modules as an extension to the RPM format. In RHEL 10, Red Hat does not intend to provide any Application Streams that use modularity as the packaging technology and, therefore, no modular content is being distributed with RHEL 10.
redhat docs link

echo "Installing Linux kernel headers..."
# Check if kernel headers are already available (mounted from host)
if [ -d "/usr/src/kernels/${KERNEL_VERSION}" ]; then
echo "Kernel headers for ${KERNEL_VERSION} already present at /usr/src/kernels/${KERNEL_VERSION}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a condition for installing the kernel headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change strictly needed for RHEL 10? If this is a cleanup or enhancement, let's do it in a follow-up instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enhancement, removed

ln -s /usr/src/kernels/${KERNEL_VERSION} /lib/modules/${KERNEL_VERSION}/build

echo "Installing Linux kernel module files..."
if ! dnf -q -y --releasever=${DNF_RELEASEVER} install kernel-${KERNEL_VERSION} > /dev/null 2>&1; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a condition for installing the kernel headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change strictly needed for RHEL 10? If this is a cleanup or enhancement, let's do it in a follow-up instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed.

Error daemonset logs:-

ERROR: Failure creating directory '/lib/firmware/nvidia' : (No such file or directory)
ERROR: Failure creating directory '/lib/firmware/nvidia' : (No such file or directory)

Error: No matching repo to modify: rhel-10-for-x86_64-baseos-eus-rpms.
Error unpacking rpm package amd-ucode-firmware-20250314-15.el10.noarch
Error unpacking rpm package atheros-firmware-20250314-15.el10.noarch
Error unpacking rpm package brcmfmac-firmware-20250314-15.el10.noarch
Error unpacking rpm package cirrus-audio-firmware-20250314-15.el10.noarch
Error unpacking rpm package intel-audio-firmware-20250314-15.el10.noarch
Error unpacking rpm package mt7xxx-firmware-20250314-15.el10.noarch
Error unpacking rpm package nxpwireless-firmware-20250314-15.el10.noarch
Error unpacking rpm package realtek-firmware-20250314-15.el10.noarch
Error unpacking rpm package tiwilink-firmware-20250314-15.el10.noarch
Error unpacking rpm package amd-gpu-firmware-20250314-15.el10.noarch
Error unpacking rpm package intel-gpu-firmware-20250314-15.el10.noarch
Error unpacking rpm package nvidia-gpu-firmware-20250314-15.el10.noarch
Error unpacking rpm package linux-firmware-20250314-15.el10.noarch
Error: Transaction failed

In RHEL10, the kernel package has stronger dependencies on firmware packages.
--setopt=install_weak_deps=False: Tells dnf to skip Recommends dependencies . Weak dependencies are packages that are nice to have.

echo "kernel requires gcc version: 'gcc-${gcc_version}', current gcc version is '${current_gcc}'"

if ! [[ "${current_gcc}" =~ "gcc-${gcc_version}"-.* ]]; then
echo "WARNING: GCC version mismatch detected, but attempting to continue..."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcc mismatch handled gracefully

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change strictly needed for RHEL 10? If this is a cleanup or enhancement, let's do it in a follow-up instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the mandatory change and removed the enhancement.

Error:

++ rpm -qa gcc
+ local current_gcc=gcc-14.3.1-2.1.el10.x86_64
+ echo 'kernel requires gcc version: '\''gcc-14.2.1
14.2.1'\'', current gcc version is '\''gcc-14.3.1-2.1.el10.x86_64'\'''
+ [[ gcc-14.3.1-2.1.el10.x86_64 =~ gcc-14\.2\.1
14\.2\.1-.* ]]
kernel requires gcc version: 'gcc-14.2.1
14.2.1', current gcc version is 'gcc-14.3.1-2.1.el10.x86_64'
+ dnf install -q -y --releasever=10.0 'gcc-14.2.1
14.2.1'
Error: Unable to find a match: gcc-14.2.1

Issue:
rgex grep -Eo "([0-9\.]+)" is matching TWO occurrences of version numbers
local gcc_version=$(cat /lib/modules/${KERNEL_VERSION}/proc/version | grep -Eo "gcc \(GCC\) ([0-9\.]+)" | grep -Eo "([0-9\.]+)")
resulting in a string with a newline.

solution:
Added | head -1 to take only the first match, ensuring gcc_version contains only 14.2.1.

@shivakunv shivakunv force-pushed the supportrhel10 branch 2 times, most recently from fb9ac92 to 8baa52b Compare January 16, 2026 03:02

extra_pkgs_install() {
if [ "$DRIVER_TYPE" != "vgpu" ]; then
if dnf module list nvidia-driver:${DRIVER_BRANCH}-dkms 2>/dev/null | grep -q "nvidia-driver"; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a check like this if [ "$DRIVER_BRANCH" -ge "580" ]; then for these conditional blocks. Can you find out which driver branches don't have this particular dnf module ?

# Download unzboot as kernel images are compressed in the zboot format on RHEL 9 arm64
# unzboot is only available on the EPEL RPM repo
rpm --import https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9
dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be using EPEL 10, not 9. If dnf install -y unzboot does work in EPEL 10, then there is probably another command to install unzboot in RHEL 10 + EPEL 10 envs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using references like this will help you find out the correct package repos and installation commands

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested, I checked the link and found that unzboot is available only in EPEL 10.2.
unzboot-0.1~git.20250530.3ccaa1a-2.el10_2 in Fedora EPEL 10.2
I have removed the RHEL9 unzboot installation and implemented a process to download and build the source and install it

@shivakunv shivakunv force-pushed the supportrhel10 branch 5 times, most recently from 806548b to eae2faf Compare January 18, 2026 07:23
Signed-off-by: Shiva Kumar (SW-CLOUD) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants