Skip to content

Conversation

jovial
Copy link
Collaborator

@jovial jovial commented Jun 10, 2025

  • Bumps EPEL snapshots to get a newer version of dkms. Fixes error:
    ==> openstack.openhpc:  Problem 1: cannot install the best candidate for the job
    ==> openstack.openhpc:   - nothing provides dkms >= 3.1.8 needed by kmod-nvidia-open-dkms-3:575.57.08-1.el9.noarch from cuda-rhel9-x86_64
    
  • Pins nvidia driver package (to nvidia-open-3:575.57.08-1)
  • Bumps cuda to 12.9.1 from 12.9.0
  • Changes installation to not require OFED/DOCA (because current packages actually don't require this)
  • Changes installation to match NVIDIA instructions more closely

Need a newer version of dkms than is available in the current snapshots:
```
 ==> openstack.openhpc:  Problem 1: cannot install the best candidate for the job
==> openstack.openhpc:   - nothing provides dkms >= 3.1.8 needed by kmod-nvidia-open-dkms-3:575.57.08-1.el9.noarch from cuda-rhel9-x86_64
```
@jovial jovial requested a review from a team as a code owner June 10, 2025 08:42
@jovial
Copy link
Collaborator Author

jovial commented Jun 10, 2025

Last build failed on epel 8
@jovial
Copy link
Collaborator Author

jovial commented Jun 10, 2025

Failed on Rocky 8 build, trying new snapshots here: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/15555013913

@sjpb sjpb changed the title Bump EPEL versions to fix nvidia build Fix nvidia build Jun 10, 2025
Copy link
Collaborator Author

@jovial jovial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but Pull request authors can’t approve their own pull request.

@sjpb sjpb marked this pull request as draft June 10, 2025 10:53
@sjpb sjpb marked this pull request as ready for review June 10, 2025 11:32
@sjpb sjpb changed the title Fix nvidia build Fix nvidia build at driver version 575.57 with cuda 12.9.1 Jun 10, 2025
@sjpb sjpb changed the title Fix nvidia build at driver version 575.57 with cuda 12.9.1 Fix nvidia build at open driver version 575.57 with cuda 12.9.1 Jun 10, 2025
@sjpb sjpb changed the title Fix nvidia build at open driver version 575.57 with cuda 12.9.1 Fix nvidia build at open driver version 575.57.08 with cuda 12.9.1 Jun 10, 2025
Copy link
Collaborator Author

@jovial jovial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - nice catch.

@sjpb sjpb merged commit 9da1cd7 into main Jun 10, 2025
7 of 10 checks passed
@sjpb sjpb deleted the bump-epel branch June 10, 2025 14:22
@sjpb sjpb mentioned this pull request Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants