-
-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Now that support is in place for rank 1 through rank 4 arrays (fp32 and fp64), it's time to look into supporting GPU acceleration for function evaluation.
To support portability between Nvidia and AMD GPUs, I'm thinking of using AMD's HIP. Because of the status of ROCm and spotty support for Windows (and no support for MacOS), this build feature will need to be optional. Additionally, because some users may be on systems that do not have ROCm installed, and only the CUDA toolkit, we'll need to use pre-processing to map procedures for GPU memory management to either the CUDA or HIP methods.
To do list
Build system
[] Add option for enabling HIP
[] Add option for enabling CUDA
[] Add CUDA and HIP build options to spack package
[] Build with HIP support with fpm ?
[] Build with CUDA support with fpm ?
Compute Kernels
We will need to have the following element-wise functions/operations defined as HIP kernels with 32-bit and 64-bit data for device pointers
[] c = a+b
[] c = a-b
[] c = a*b
[] c = a/b
[] c = a^s
(s
is a scalar)
[] c = \abs(a)
[] c = \cos(a)
[] c = \sin(a)
[] c = \tan(a)
[] c = \acos(a)
[] c = \asin(a)
[] c = \atan(a)
[] c = \sinh(a)
[] c = \cosh(a)
[] c = \tanh(a)
[] c = \sqrt(a)
[] c = \ln(a)
(natural logarithm)
[] c = \log(a)
(log base-10)
[] c = -a
(sign flip)