-
Hi everyone, Thank you in advance for taking the time to read this. Please excuse my lack of experience, I am a newcomer to the field of HPC. I have recently been working on some CUDA-parallelized scientific code. I am working on an embarassingly parallel problem in the sense that I have to do some independent computations on a given number of points, let n. These computations vary depending on the studied constitutive law ; therefore, they are encapsulated (as well as their cuda loading/storing operations) in different functors that I pass to a generic cuda kernel launcher as follows :
Now here is that CUDA kernel, where each thread works on exactly one out of the total n independent points :
All of my functors do some basic computations, but some of them feature additional iterative methods such as Newton-Raphson, which requires the solving of an Ax=b linear system at each iteration. This resolving causes performance issues, and instead of optimizing it myself, I am looking for pre-made algebraic libraries that come up with already-optimised solvers. However, due to the specific structure of the problem/code, I am looking for some device-callable solvers, that I could call from inside my cuda kernel. So my two questions here are :
If the answer to both these questions is no, I guess I will have to use a host-called batched-solver, which would be quite annoying given the fact that all n points do not require the same number of iterations (i.e., not the same number of batched-solver calls, which would mean a completely different code structure). |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @trsxvz , you can check the kernel we have for batch solver in https://github.com/ginkgo-project/ginkgo/blob/develop/common/cuda_hip/solver/batch_cg_kernels.hpp where apply_kernel is the actual kernel we call for batch CG solver. https://github.com/ginkgo-project/ginkgo/blob/develop/core/solver/batch_dispatch.hpp gives the ways how we map the host type to the type used in kernel, so you do not need to prepare the class from host type. |
Beta Was this translation helpful? Give feedback.
Hi @trsxvz ,
Thanks for reaching out to us.
@pratikvn corrects me if I am wrong.
I think it is possible to call batch solver from device but you need to having the batch corresponding header into your searching path.
you can check the kernel we have for batch solver in https://github.com/ginkgo-project/ginkgo/blob/develop/common/cuda_hip/solver/batch_cg_kernels.hpp where apply_kernel is the actual kernel we call for batch CG solver.
However, you need to arrange your data to fit the function style.
https://github.com/ginkgo-project/ginkgo/blob/develop/core/solver/batch_dispatch.hpp gives the ways how we map the host type to the type used in kernel, so you do not need to prepare the class f…