-
Notifications
You must be signed in to change notification settings - Fork 57
add support for additional hidden kernel args #717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
06246b2 to
aac7d0f
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## tb/kernel_state_reference #717 +/- ##
==============================================================
+ Coverage 0.00% 72.32% +72.32%
==============================================================
Files 24 24
Lines 3507 3621 +114
==============================================================
+ Hits 0 2619 +2619
+ Misses 3507 1002 -2505 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
So basically dynamic shared memory from CUDA, but using arguments instead of an API call to configure the amount of memory? I don't particularly like how this is fitted in with the KernelState stuff though. What about generalizing the Metal.jl functionality instead?That should be much closer already to what you need, lowering arbitrary intrinsics to trailing kernel arguments. |
|
Hmm, yes, that looks very promising! I'll try to use that then |
Allows other backends to pass additional hidden arguments that can be accessed through intrinsics. Required for OpenCL device-side RNG support, where additional shared memory must be passed as arguments to the kernel. Replaces #717
|
Closing in favor of #718 |
Allows other backends to pass additional hidden arguments that can be accessed through intrinsics. Required for OpenCL device-side RNG support, where additional shared memory must be passed as arguments to the kernel. Replaces #717
Allows other backends to pass additional hidden arguments that can be accessed through intrinsics. Required for OpenCL device-side RNG support, where additional shared memory must be passed as arguments to the kernel. Replaces #717
Allows other backends to pass additional hidden arguments that can be accessed through intrinsics. Required for OpenCL device-side RNG support, where additional shared memory must be passed as arguments to the kernel. Replaces #717
For device-side RNG in OpenCL we need to allocate local memory depending on the number of subgroups in a workgroup and AFAICT, the only way to do that is to pass the local memory as an argument to the kernel. This adds functionality to add such additional hidden arguments to a kernel and feed them through functions that require them based on the existing kernel state passes.