-
Notifications
You must be signed in to change notification settings - Fork 26
prgenv-gnu with ROCm 7 #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
|
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
ROCm is a menace: https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/551234120955960/1440398897047560/-/jobs/12201564988#L3724. hipblaslt seems to be picking up amdclang++ from the system... needs further investigation. |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
Hi, |
I'll try it out, hopefully no bigger issues (though note that I have issues with other packages before this is usable unfortunately). |
|
spack/spack-packages#2287 to add rocm 7.1.0 is also currently open. It may help, it may make things worse... The PR description does mention a change to hipblaslt, which may change something. |
Yes, it looks like an older version of ROCm is installed on the system and it's choosing the incorrect version of amdclang++. I'll see if I can reproduce the issue and put in a fix. |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
1 similar comment
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
…linked to rocsparse
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
2 similar comments
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
1 similar comment
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
Issue is:
ImportError: /user-environment/env/default/lib/libhipsparse.so.4: undefined symbol: rocsparse_destroy_spmv_descr
which is strange because:
ldd /user-environment/env/default/lib/libhipsparse.so.4
linux-vdso.so.1 (0x00007ffddc1c1000)
/user-environment/env/default/lib/librocsparse.so (0x00007f6af9a27000)
nm /user-environment/linux-zen3/rocsparse-7.1.0-qfflrfxfu2vaqu52hitvwyukdwq767rf/lib/librocsparse.so | grep rocsparse_destroy_spmv_descr
0000000002f33fc0 T rocsparse_destroy_spmv_descr
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
|
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
Thank you very much for the help 👍 This is something to check in a following PR maybe with updating to ROCM 7.1.1 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
1 similar comment
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
| depends_on("py-textual-plotext", when="@7.0:", type=("build", "run")) | ||
| depends_on("py-sqlalchemy@2.0.42:", when="@7.1:", type=("build", "run")) | ||
|
|
||
| patch("keep_ld_preload.patch", when="@7.1.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason hipsparse.so doesn't link with some signatures of rocsparse.so and copy fails when it gets imported without export LD_PRELOAD=/user-environment/env/default/lib/librocsparse.so:$LD_PRELOAD or something similar.
I tried to fix the issue in hipsparse/rocsparse but I don't understand why linking is going wrong. I will probably have to keep using the LD_PRELOAD workaround and report the issue to AMD
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
Just for testing builds, I don't know if this will work.