MLPoisson on AMD GPUs crashes - kernel not starting (?) #4336
-
Hi all I have been using the MLPoisson functionality to solve Poisson's equation, and it runs with no issues on CPUs and on GPUs with the CUDA backend. Having migrated to a new system (LUMI), I need to run the same code on the AMD GPUs, and this is crashing during runtime. (MRE attached). AMReX was compiled and installed (with cmake, which is the preferred approach at LUMI) using the amdclang compiler and the HIP backend, and has been working without problems thus far. I have had no issues setting up MultiFabs and using ParallelFor I have attached the MRE, the error occurs in line 96: It is my understanding (not an expert in reading traces...) that the kernel has not been created and the template instantiation has not worked. I am usually compiling AMReX code with amdclang for uniformity, however, crayclang was also tried. I contacted support at LUMI, who also tried varying verions of ROCm (6.0 and 6.2) and two AMReX versions (devel and v2024). Any advice on this issue is massively appreciated! Thanks so much in advance for your time. Backtrace:
Edit: put line 96 into |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Could you try |
Beta Was this translation helpful? Give feedback.
-
The fix for this was to recompile |
Beta Was this translation helpful? Give feedback.
The fix for this was to recompile
AMReX
withoutOpenMP,
so withoutUSE_OMP=TRUE
. Then I can run both the tests and my MRE without problems.Why this fixed the problem is somewhat beyond me, but this worked for me, in any case. :)