-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Hi team.
Recently, we found that torch.compile spends too much time in Triton compilation, and one of the main overheads is building kernel launchers.
In one Inductor UT containing two Triton kernels, the launcher building takes 5s out of a total 13s.
In addition, during UT execution time profiling, I observed that compiling a single SPIR-V kernel can take anywhere from tens of milliseconds to 5–6 seconds, depending on the case.
Across about 20,000 Inductor cases, this results in a significant waste of CI hardware resources.
I also measured the E2E compilation time as follows (first model compilation) and show the impact on user experience:
| LNL windows , CPU Ultra 9 3.30GHz | compile launcher time(s) | compile model time(s) | Can reduce percent |
|---|---|---|---|
| Llama-3.1-8B | 38 | 210 | 18.10% |
| stable-diffusion-3-medium | 30 | 318 | 9.43% |
| BMG windows , CPU i9 2.40GHz | compile launcher time(s) | compile model time(s) | Can reduce percent |
| Llama-3.1-8B | 75 | 430 | 17.44% |
| stable-diffusion-3-medium | 65 | 460 | 14.13% |
| Linux PVC, Intel(R) Xeon(R) Platinum 0.8GHz | compile launcher time(s) | compile model time(s) | Can reduce percent |
| Llama-3.1-8B | 79 | 210 | 37.62% |
| stable-diffusion-3-medium | 83 | 200 | 41.50% |
Based on these results, we are enabling static kernel launchers in Inductor, similar to CUDA.
The static launcher is built during the PyTorch build process and can accept various arguments to execute kernels.
However, in the static path we need to load SPIR-V kernels returned by tl.compile, which causes the same issue as #5153.
Therefore, I’d like to propose adding an option in tl.compile to return a binary kernel.
I suggest making the kernel format (SPIR-V / binary) a build option for triton.compile, as this would provide greater flexibility since the Inductor side is programmable.
We plan to set the default format to binary, and fall back to SPIR-V on Windows if binary is not supported there.