-
Notifications
You must be signed in to change notification settings - Fork 62
Description
🐛 Describe the bug
We have witnessed that when running models, there are some warnings from Triton, that would be like this:
xpu train AlbertForQuestionAnswering
(I): Detected 9472 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 512 spills
(I): Detected 20032 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 10816 spills
(I): Detected 33600 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 25408 spills
This is because we didn't set the grf_mode in the triton config, and there are register spills exceeding the thresh_hold. Thus it triggers an automatic using large grf mode re-compile for the Triton kernel.
This is the expected behavior. We have two options:
- Set the
grf_mode=autoin inductor side. So that when there is xpu, thetriton.Configwill have this kwarg. - Discuss with the Triton team about hiding this from end users. These outputs should be treated as warnings.
After the offline discussion, we think option 2 is better, because we need to keep from the inductor side, that there will be no difference between XPU and CUDA/HIP. We wish to always keep the same config for all kinds of devices. The "grf_mode" should be set by Triton for XPU only.
I also created an issue on Triton's repo. We would see the discussion there:
intel/intel-xpu-backend-for-triton#2251
Versions
PyTorch: 1a67e2b6801e09ca538555c517e2e9120c7e40bf
Triton Commit: 91b14bf5593cf58a8541f3e6b9125600a867d4ef