Skip to content

Conversation

@Thyre
Copy link
Collaborator

@Thyre Thyre commented Dec 15, 2025

(created using eb --new-pr)

Follow up to #3788, to properly handle NVHPC versions where the older CUDA version is used (in particular easybuilders/easybuild-easyconfigs#23989).


To correctly set the default CUDA version for the symlinks created by a NVHPC installation, we need to change the command in install_components/install. This is especially important with CUDA 13 (NVHPC 25.9+).

By default, the installer runs:

    CC=$INSTALL_DIR/$arch/$release/compilers/bin/nvc
    DESIREDCUDA=$($CC -printcudaversion 2>&1 | grep -i "selected cuda version" | cut -d'=' -f2)

which determines e.g. the paths for the symlinks created by NVHPC.
Without a GPU present, this results in the maximum CUDA version supported by the installed CUDA driver, or no CUDA version at all. To ensure that the correct CUDA version is used, replace the command by the default_cuda_version we set in the EasyBlock.

To further make sure that the symlinks are correct, add a sanity check for NCCL, NVSHMEM and the math libraries, comparing the paths of the symlink to the paths we would expect from the selected CUDA version.

@Thyre Thyre marked this pull request as draft December 15, 2025 11:00
@Thyre Thyre added this to the release after 5.2.0 milestone Dec 15, 2025
@Thyre
Copy link
Collaborator Author

Thyre commented Dec 15, 2025

@boegelbot please test @ jsc-zen3
EB_ARGS="nvidia-compilers-25.3-CUDA-12.8.0.eb NVHPC-25.3-CUDA-12.8.0.eb nvidia-compilers-25.1-CUDA-12.6.0.eb NVHPC-25.1-CUDA-12.6.0.eb nvidia-compilers-25.1.eb NVHPC-25.1.eb nvidia-compilers-25.3.eb NVHPC-25.3.eb --installpath /tmp/$USER/ebpr-4024"

@boegelbot
Copy link

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4024 EB_ARGS="nvidia-compilers-25.3-CUDA-12.8.0.eb NVHPC-25.3-CUDA-12.8.0.eb nvidia-compilers-25.1-CUDA-12.6.0.eb NVHPC-25.1-CUDA-12.6.0.eb nvidia-compilers-25.1.eb NVHPC-25.1.eb nvidia-compilers-25.3.eb NVHPC-25.3.eb --installpath /tmp/$USER/ebpr-4024" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4024 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9173

Test results coming soon (I hope)...

Details

- notification for comment with ID 3656204638 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS nvidia-compilers-25.3-CUDA-12.8.0.eb

  • SUCCESS NVHPC-25.3-CUDA-12.8.0.eb

  • SUCCESS nvidia-compilers-25.1-CUDA-12.6.0.eb

  • SUCCESS NVHPC-25.1-CUDA-12.6.0.eb

  • SUCCESS nvidia-compilers-25.1.eb

  • SUCCESS NVHPC-25.1.eb

  • SUCCESS nvidia-compilers-25.3.eb

  • SUCCESS NVHPC-25.3.eb

Build succeeded for 8 out of 8 (total: 1 hour 55 mins 37 secs) (8 easyconfigs in total)
jsczen3c3.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/85a5edeae8526ae307ed63397487593c for a full test report.

@Thyre
Copy link
Collaborator Author

Thyre commented Dec 15, 2025

@boegelbot please test @ jsc-zen3
EB_ARGS="NVHPC-21.9.eb NVHPC-22.11-CUDA-11.7.0.eb NVHPC-23.1-CUDA-12.0.0.eb NVHPC-24.9-CUDA-12.6.0.eb --installpath /tmp/$USER/ebpr-4024"

@boegelbot
Copy link

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4024 EB_ARGS="NVHPC-21.9.eb NVHPC-22.11-CUDA-11.7.0.eb NVHPC-23.1-CUDA-12.0.0.eb NVHPC-24.9-CUDA-12.6.0.eb --installpath /tmp/$USER/ebpr-4024" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4024 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9188

Test results coming soon (I hope)...

Details

- notification for comment with ID 3657282745 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 3 out of 4 (total: 35 mins 24 secs) (4 easyconfigs in total)
jsczen3c3.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/4f14387cb740f07a4d81b547362a9490 for a full test report.

@Thyre
Copy link
Collaborator Author

Thyre commented Dec 15, 2025

NVHPC 21.9 failure is unrelated to this PR.

@Thyre Thyre requested review from boegel and lexming December 15, 2025 20:37
@Thyre Thyre marked this pull request as ready for review December 15, 2025 20:38
@Thyre Thyre added the bug fix label Dec 16, 2025
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
@Thyre
Copy link
Collaborator Author

Thyre commented Dec 19, 2025

@boegelbot please test @ jsc-zen3-a100
EB_ARGS="NVHPC-22.11-CUDA-11.7.0.eb NVHPC-25.3.eb --installpath /tmp/$USER/ebpr-4024"

@boegelbot
Copy link

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4024 EB_ARGS="NVHPC-22.11-CUDA-11.7.0.eb NVHPC-25.3.eb --installpath /tmp/$USER/ebpr-4024" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4024 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9257

Test results coming soon (I hope)...

Details

- notification for comment with ID 3674612988 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS NVHPC-22.11-CUDA-11.7.0.eb

  • SUCCESS nvidia-compilers-25.3.eb

  • SUCCESS NVHPC-25.3.eb

Build succeeded for 3 out of 3 (total: 35 mins 41 secs) (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 580.95.05, Python 3.9.21
See https://gist.github.com/boegelbot/45ad6e127338ccde8a7aa752ccebbf16 for a full test report.

@SebastianAchilles
Copy link
Member

Test report by @SebastianAchilles

Overview of tested easyconfigs (in order)

  • SUCCESS nvidia-compilers-25.3-CUDA-12.8.0.eb

  • SUCCESS NVHPC-25.3-CUDA-12.8.0.eb

  • SUCCESS nvidia-compilers-25.1-CUDA-12.6.0.eb

  • SUCCESS NVHPC-25.1-CUDA-12.6.0.eb

  • SUCCESS nvidia-compilers-25.1.eb

  • SUCCESS NVHPC-25.1.eb

  • SUCCESS nvidia-compilers-25.3.eb

  • SUCCESS NVHPC-25.3.eb

  • SUCCESS NVHPC-22.11-CUDA-11.7.0.eb

  • SUCCESS NVHPC-23.1-CUDA-12.0.0.eb

  • SUCCESS NVHPC-24.9-CUDA-12.6.0.eb

Build succeeded for 11 out of 11 (total: 50 mins 55 secs) (11 easyconfigs in total)
skx-rockylinux-97 - Linux Rocky Linux 9.7 (Blue Onyx), x86_64, Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz (skylake_avx512), 4 x NVIDIA Tesla V100-SXM2-32GB, 580.105.08, Python 3.9.25
See https://gist.github.com/SebastianAchilles/eadab246dfac309873b573e4c11db055 for a full test report.

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I just have a minor comment to simplify the approach.

Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
@Thyre
Copy link
Collaborator Author

Thyre commented Jan 19, 2026

@boegelbot please test @ jsc-zen3-a100
EB_ARGS="NVHPC-22.11-CUDA-11.7.0.eb NVHPC-25.3.eb --installpath /tmp/$USER/ebpr-4024"

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@boegelbot
Copy link

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4024 EB_ARGS="NVHPC-22.11-CUDA-11.7.0.eb NVHPC-25.3.eb --installpath /tmp/$USER/ebpr-4024" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4024 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9420

Test results coming soon (I hope)...

Details

- notification for comment with ID 3767634179 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS NVHPC-22.11-CUDA-11.7.0.eb

  • SUCCESS NVHPC-25.3.eb

Build succeeded for 2 out of 2 (total: 20 mins 38 secs) (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.44.01, Python 3.9.23
See https://gist.github.com/boegelbot/0c1a07d4b6c0520595cfd7af61eda4ba for a full test report.

@lexming
Copy link
Contributor

lexming commented Jan 19, 2026

Merging, thanks @Thyre !

@lexming lexming merged commit 24ada85 into easybuilders:develop Jan 19, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants