Skip to content

Commit 8bb917f

Browse files
[FA] Update autotune configs on default path (#2475)
In this PR, we aim to allow configurations that are equivalent to the one used on advanced path. This PR gives 4% performance improvement on geomean of FA out of box. On advanced path, `BLOCK_M` is 128, `num_warps` can be `8` or `16`. Signed-off-by: Whitney Tsang <[email protected]>
1 parent b102bf3 commit 8bb917f

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

benchmarks/triton_kernels_benchmark/flash_attention_fwd_benchmark.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -154,10 +154,10 @@ def _attn_fwd(Q, K, V, sm_scale, M, Out, #
154154

155155
configs = [
156156
triton.Config({'BLOCK_M': BM, 'BLOCK_N': BN, 'grf_mode': 'large'}, num_stages=s, num_warps=w) \
157-
for BM in [256] \
157+
for BM in [128, 256] \
158158
for BN in [32, 64] \
159-
for s in [3] \
160-
for w in [32] \
159+
for s in [3, 4] \
160+
for w in [8, 16, 32] \
161161
]
162162

163163
tuner = triton.autotune(configs, key=['N_CTX', 'BLOCK_DMODEL'])

0 commit comments

Comments
 (0)