-
Notifications
You must be signed in to change notification settings - Fork 310
avoid incorrect symlinks for NVHPC by force-setting selected CUDA version in install script #4024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid incorrect symlinks for NVHPC by force-setting selected CUDA version in install script #4024
Conversation
…sion in install script
|
@boegelbot please test @ jsc-zen3 |
|
@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 3656204638 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 8 out of 8 (total: 1 hour 55 mins 37 secs) (8 easyconfigs in total) |
|
@boegelbot please test @ jsc-zen3 |
|
@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 3657282745 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 3 out of 4 (total: 35 mins 24 secs) (4 easyconfigs in total) |
|
NVHPC 21.9 failure is unrelated to this PR. |
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
|
@boegelbot please test @ jsc-zen3-a100 |
|
@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 3674612988 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 3 out of 3 (total: 35 mins 41 secs) (2 easyconfigs in total) |
|
Test report by @SebastianAchilles Overview of tested easyconfigs (in order)
Build succeeded for 11 out of 11 (total: 50 mins 55 secs) (11 easyconfigs in total) |
lexming
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. I just have a minor comment to simplify the approach.
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
|
@boegelbot please test @ jsc-zen3-a100 |
lexming
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 3767634179 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 2 out of 2 (total: 20 mins 38 secs) (2 easyconfigs in total) |
|
Merging, thanks @Thyre ! |
(created using
eb --new-pr)Follow up to #3788, to properly handle NVHPC versions where the older CUDA version is used (in particular easybuilders/easybuild-easyconfigs#23989).
To correctly set the default CUDA version for the symlinks created by a NVHPC installation, we need to change the command in
install_components/install. This is especially important with CUDA 13 (NVHPC 25.9+).By default, the installer runs:
which determines e.g. the paths for the symlinks created by NVHPC.
Without a GPU present, this results in the maximum CUDA version supported by the installed CUDA driver, or no CUDA version at all. To ensure that the correct CUDA version is used, replace the command by the
default_cuda_versionwe set in the EasyBlock.To further make sure that the symlinks are correct, add a sanity check for NCCL, NVSHMEM and the math libraries, comparing the paths of the symlink to the paths we would expect from the selected CUDA version.