Skip to content

Auto-tuning workgroupsize when localmem consumption depends on it #215

@tkf

Description

@tkf

Does KernelAbstractions.jl support auto-setting workgroupsize when the kernel has local memory size that depends on groupsize? For example, CUDA.launch_configuration takes a shmem callback that maps a number of threads to shared memory used. This is used for implementing mapreduce in CUDA.jl. Since shmem argument for CUDA.launch_configuration is not used in Kernel{CUDADevice}, I guess it's not implemented yet? Is it related to #19?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions