Skip to content

Conversation

@simeonschaub
Copy link
Member

For device-side RNG in OpenCL we need to allocate local memory depending on the number of subgroups in a workgroup and AFAICT, the only way to do that is to pass the local memory as an argument to the kernel. This adds functionality to add such additional hidden arguments to a kernel and feed them through functions that require them based on the existing kernel state passes.

@codecov
Copy link

codecov bot commented Sep 6, 2025

Codecov Report

❌ Patch coverage is 23.43750% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.32%. Comparing base (dce63fc) to head (4b5cd62).

Files with missing lines Patch % Lines
src/irgen.jl 22.22% 49 Missing ⚠️
Additional details and impacted files
@@                      Coverage Diff                       @@
##           tb/kernel_state_reference     #717       +/-   ##
==============================================================
+ Coverage                       0.00%   72.32%   +72.32%     
==============================================================
  Files                             24       24               
  Lines                           3507     3621      +114     
==============================================================
+ Hits                               0     2619     +2619     
+ Misses                          3507     1002     -2505     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt
Copy link
Member

maleadt commented Sep 9, 2025

So basically dynamic shared memory from CUDA, but using arguments instead of an API call to configure the amount of memory?

I don't particularly like how this is fitted in with the KernelState stuff though. What about generalizing the Metal.jl functionality instead?That should be much closer already to what you need, lowering arbitrary intrinsics to trailing kernel arguments.

@simeonschaub
Copy link
Member Author

Hmm, yes, that looks very promising! I'll try to use that then

simeonschaub added a commit that referenced this pull request Sep 9, 2025
Allows other backends to pass additional hidden arguments that can be accessed through intrinsics. Required for OpenCL device-side RNG support, where additional shared memory must be passed as arguments to the kernel.

Replaces #717
@simeonschaub
Copy link
Member Author

Closing in favor of #718

simeonschaub added a commit that referenced this pull request Sep 9, 2025
Allows other backends to pass additional hidden arguments that can be accessed through intrinsics. Required for OpenCL device-side RNG support, where additional shared memory must be passed as arguments to the kernel.

Replaces #717
simeonschaub added a commit that referenced this pull request Sep 11, 2025
Allows other backends to pass additional hidden arguments that can be accessed through intrinsics. Required for OpenCL device-side RNG support, where additional shared memory must be passed as arguments to the kernel.

Replaces #717
simeonschaub added a commit that referenced this pull request Sep 11, 2025
Allows other backends to pass additional hidden arguments that can be accessed through intrinsics. Required for OpenCL device-side RNG support, where additional shared memory must be passed as arguments to the kernel.

Replaces #717
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants