Skip to content

Binary kernel for Inductor static kernel launcher. #5388

@etaf

Description

@etaf

Hi team.

Recently, we found that torch.compile spends too much time in Triton compilation, and one of the main overheads is building kernel launchers.
In one Inductor UT containing two Triton kernels, the launcher building takes 5s out of a total 13s.
In addition, during UT execution time profiling, I observed that compiling a single SPIR-V kernel can take anywhere from tens of milliseconds to 5–6 seconds, depending on the case.
Across about 20,000 Inductor cases, this results in a significant waste of CI hardware resources.

I also measured the E2E compilation time as follows (first model compilation) and show the impact on user experience:

LNL windows , CPU Ultra 9 3.30GHz compile launcher time(s) compile model time(s) Can reduce percent
Llama-3.1-8B 38 210 18.10%
stable-diffusion-3-medium 30 318 9.43%
BMG windows , CPU i9 2.40GHz compile launcher time(s) compile model time(s) Can reduce percent
Llama-3.1-8B 75 430 17.44%
stable-diffusion-3-medium 65 460 14.13%
Linux PVC, Intel(R) Xeon(R) Platinum 0.8GHz compile launcher time(s) compile model time(s) Can reduce percent
Llama-3.1-8B 79 210 37.62%
stable-diffusion-3-medium 83 200 41.50%

Based on these results, we are enabling static kernel launchers in Inductor, similar to CUDA.
The static launcher is built during the PyTorch build process and can accept various arguments to execute kernels.

However, in the static path we need to load SPIR-V kernels returned by tl.compile, which causes the same issue as #5153.
Therefore, I’d like to propose adding an option in tl.compile to return a binary kernel.

I suggest making the kernel format (SPIR-V / binary) a build option for triton.compile, as this would provide greater flexibility since the Inductor side is programmable.
We plan to set the default format to binary, and fall back to SPIR-V on Windows if binary is not supported there.

Metadata

Metadata

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions