Skip to content

Commit 902fd39

Browse files
author
Maxime France-Pillois
authored
[FA Performance] Add configurations to FA auto-tuner (#3725)
Enhance the FA auto-tuner to evaluate more configurations (including CUTLASS configurations).
1 parent 047f074 commit 902fd39

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

benchmarks/triton_kernels_benchmark/flash_attention_benchmark.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ def _attn_fwd(Q, K, V, sm_scale, M, Out, #
157157
triton.Config({'BLOCK_M': BM, 'BLOCK_N': BN, 'grf_mode': 'large', 'one_matrix_per_load_for_bt': True}, num_stages=s, num_warps=w) \
158158
for BM in [128, 256] \
159159
for BN in [32, 64] \
160-
for s in [3, 4] \
160+
for s in [2, 3, 4] \
161161
for w in [8, 16, 32] \
162162
]
163163

0 commit comments

Comments
 (0)