Skip to content

[CUDA] pass kernel parameters from GpuAst to execute call #566

@Vindaar

Description

@Vindaar

When executing CUDA kernels generated using the GPU compiler and NVRTC, we need to change the code such that we pass the parameters of the kernel (as determined from the GpuAst) into execute. This would allow us to make the logic in cuda_execute_dsl.nim much more robust.

The file has a requiresCopy function which tries to determine whether the input passed into execute will need to be copied or can be passed via host pointer to the kernel. This has the glaring issue that as we don't know the actual kernel arguments, there may be a mismatch between what we pass in and what the kernel expects.
We can for example pass in an array[8, uint32] when the kernel expects a BigInt with 8 uint32 limbs, because the underlying data is identical. This easily runs into issues though, because in the inverse case we'd generate the wrong code, because static arrays are passed by pointer in C/C++/CUDA for example.

At the moment it is easy to run into bizarre runtime issues due to the fact that the logic tries to do the right thing, but we end up copying when we shouldn't or vice versa.

(I'll add a proper example when I start working on this)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions