[CUDA] pass kernel parameters from GpuAst to `execute` call

When executing CUDA kernels generated using the GPU compiler and NVRTC, we need to change the code such that we pass the parameters of the kernel (as determined from the `GpuAst`) into `execute`. This would allow us to make the logic in `cuda_execute_dsl.nim` much more robust.

The [file has a `requiresCopy` function](https://github.com/mratsim/constantine/blob/master/constantine/math_compiler/experimental/cuda_execute_dsl.nim#L29-L64) which tries to determine whether the *input* passed into `execute` will need to be copied or can be passed via host pointer to the kernel. This has the glaring issue that as we don't know the actual kernel arguments, there may be a mismatch between what we pass in and what the kernel expects. 
We can for example pass in an `array[8, uint32]` when the kernel expects a `BigInt` with 8 uint32 limbs, because the underlying data is identical. This easily runs into issues though, because in the inverse case we'd generate the wrong code, because static arrays are passed by pointer in C/C++/CUDA for example.

At the moment it is easy to run into bizarre runtime issues due to the fact that the logic tries to do the right thing, but we end up copying when we shouldn't or vice versa.

(I'll add a proper example when I start working on this)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] pass kernel parameters from GpuAst to `execute` call #566

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[CUDA] pass kernel parameters from GpuAst to execute call #566

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[CUDA] pass kernel parameters from GpuAst to `execute` call #566