Restrict Register Per Thread in CUDA #4574
-
How to restrict the maximum register usage per thread in amrex, when I are trying to profile the GPU program? There are still output information like 'ptxas info : Used 240 registers, 592 bytes cmem[0]' during compilation. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 12 replies
-
It seems to work for me . |
Beta Was this translation helpful? Give feedback.
-
I still think you had a typo.
Could you provide more detail so that we can see exactly what flags nvcc gets? For example, I can see
|
Beta Was this translation helpful? Give feedback.
-
I have tried another case within a workspace without PelePhysics. In this case, I have hard-coded the per thread register limit in nvcc from However, the issue does not get improved at all. It reports as
|
Beta Was this translation helpful? Give feedback.
-
I thought
I guess for some of the kernels it's impossible to run without bumping up the register counts. Then there is probably nothing you can do. |
Beta Was this translation helpful? Give feedback.
-
Hi, back here again for some help. I have encountered significant register pressure when using the per-thread model on GPUs, resulting in register spilling and reduced concurrency, by using the performance portable ParallelFor function https://amrex-codes.github.io/amrex/docs_html/Basics.html#parallelfor:~:text=%23ifdef%20AMREX_USE_OMP%0A%23pragma%20omp%20parallel%20if,j%2Ck)%3B%0A%20%20%20%20%7D)%3B%0A%20%20%7D. I wonder if AMReX has had some existing approaches for solving this kind of problems, e.g., a per-block model or even a per-warp model in Ref. (https://dl.acm.org/doi/10.1145/2555243.2555258). Thanks in advance! |
Beta Was this translation helpful? Give feedback.
I thought
maxrregcount
is a hard ceiling for nvcc. But it is not. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#maxrregcount-amount-maxrregcountI guess for some of the kernels it's impossible to run without bumping up the register counts. Then there is probably nothing you can do.