Skip to content

[E2E] Warnings on using large GRF mode #919

@Stonepia

Description

@Stonepia

🐛 Describe the bug

We have witnessed that when running models, there are some warnings from Triton, that would be like this:

xpu  train AlbertForQuestionAnswering         

(I): Detected 9472 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 512 spills
(I): Detected 20032 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 10816 spills
(I): Detected 33600 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 25408 spills

This is because we didn't set the grf_mode in the triton config, and there are register spills exceeding the thresh_hold. Thus it triggers an automatic using large grf mode re-compile for the Triton kernel.

This is the expected behavior. We have two options:

  1. Set the grf_mode=auto in inductor side. So that when there is xpu, the triton.Config will have this kwarg.
  2. Discuss with the Triton team about hiding this from end users. These outputs should be treated as warnings.

After the offline discussion, we think option 2 is better, because we need to keep from the inductor side, that there will be no difference between XPU and CUDA/HIP. We wish to always keep the same config for all kinds of devices. The "grf_mode" should be set by Triton for XPU only.

I also created an issue on Triton's repo. We would see the discussion there:
intel/intel-xpu-backend-for-triton#2251

Versions

PyTorch: 1a67e2b6801e09ca538555c517e2e9120c7e40bf
Triton Commit: 91b14bf5593cf58a8541f3e6b9125600a867d4ef

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions