Auto-tuning workgroupsize when localmem consumption depends on it

Does KernelAbstractions.jl support auto-setting workgroupsize when the kernel has local memory size that depends on groupsize? For example, `CUDA.launch_configuration` takes a `shmem` callback that maps a number of threads to shared memory used. This is used for [implementing `mapreduce` in CUDA.jl](https://github.com/JuliaGPU/CUDA.jl/blob/465535069383ce66b137c6c64f188e4b9164ec15/src/mapreduce.jl#L187-L196). Since `shmem` argument for [`CUDA.launch_configuration`](https://github.com/JuliaGPU/KernelAbstractions.jl/blob/b2f7105ef2a0a570cc7f5807289faad7db9c0b35/lib/CUDAKernels/src/CUDAKernels.jl#L196) is not used in `Kernel{CUDADevice}`, I guess it's not implemented yet? Is it related to #19?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-tuning workgroupsize when localmem consumption depends on it #215

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Auto-tuning workgroupsize when localmem consumption depends on it #215

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions