-
-
Notifications
You must be signed in to change notification settings - Fork 62
Description
When executing CUDA kernels generated using the GPU compiler and NVRTC, we need to change the code such that we pass the parameters of the kernel (as determined from the GpuAst) into execute. This would allow us to make the logic in cuda_execute_dsl.nim much more robust.
The file has a requiresCopy function which tries to determine whether the input passed into execute will need to be copied or can be passed via host pointer to the kernel. This has the glaring issue that as we don't know the actual kernel arguments, there may be a mismatch between what we pass in and what the kernel expects.
We can for example pass in an array[8, uint32] when the kernel expects a BigInt with 8 uint32 limbs, because the underlying data is identical. This easily runs into issues though, because in the inverse case we'd generate the wrong code, because static arrays are passed by pointer in C/C++/CUDA for example.
At the moment it is easy to run into bizarre runtime issues due to the fact that the logic tries to do the right thing, but we end up copying when we shouldn't or vice versa.
(I'll add a proper example when I start working on this)