Skip to content

Avoid recompiling the kernel when the workload changes #3

@dineiar

Description

@dineiar

Currently, the kernel code generation is bound to a specific workload size (dimensions) here. So, whenever the dimensions change, the code is regenerated and recompiled.

@gabriellaraujo1903 reported a substantial overhead of this behavior:

When the workload size changes, GSParLib recompiles the GPU kernel.

For instance, suppose we execute a vector sum where the vector's size is 10,000; Then we run another vector sum where the vector's size is 50,000. In this case, GSParLib will recompile the GPU kernel.

This behaviour imposes a performance degradation when a GPU kernel is executed several times, and the workload size continuously changes.

It occurs in the MG program from NPB. It is an iterative program where the GPU kernels are called thousands of times, and the workload varies continuously. Recompiling, in this case, imposes a considerable performance degradation; GPU execution time can be even worse than the serial code.

On the other hand, CUDA does not require a GPU kernel recompilation when changing the workload size. If I'm correct, only batching would require recompilation; I do not remember right now.

This issue aims to avoid recompiling the kernel when the workload changes. We need to investigate if we can reuse the code when the workload changes.

The workload is passed as an argument in the kernel launch, so maybe we just need to remove the extra compilation step. The code probably needs to be recompiled when dimensions (x, y, z) are added or removed, so the main aim of this issue is to avoid recompiling when the workload size changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions