Skip to content

Commit 6f89dbe

Browse files
[FA] Specify large GRF in autotune (#2410)
By specifying GRF mode explicitly, the number of runs can be reduced. CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11148146767 Signed-off-by: Whitney Tsang <[email protected]>
1 parent 53b1198 commit 6f89dbe

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

benchmarks/triton_kernels_benchmark/flash_attention_fwd_benchmark.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ def _attn_fwd(Q, K, V, sm_scale, M, Out, #
153153

154154

155155
configs = [
156-
triton.Config({'BLOCK_M': BM, 'BLOCK_N': BN}, num_stages=s, num_warps=w) \
156+
triton.Config({'BLOCK_M': BM, 'BLOCK_N': BN, 'grf_mode': 'large'}, num_stages=s, num_warps=w) \
157157
for BM in [256] \
158158
for BN in [32, 64] \
159159
for s in [3] \

0 commit comments

Comments
 (0)