Skip to content

Commit 0cd5b90

Browse files
authored
[TRITON_KERNELS] tweak matmul_ogs heuristics (#7664)
1 parent 1d99b61 commit 0cd5b90

File tree

1 file changed

+2
-1
lines changed
  • python/triton_kernels/triton_kernels/matmul_ogs_details

1 file changed

+2
-1
lines changed

python/triton_kernels/triton_kernels/matmul_ogs_details/opt_flags.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,8 @@ def make_default_opt_flags_nvidia(
157157
elif enforce_bitwise_invariance:
158158
block_m = 128
159159
else:
160-
block_m = max(16, min(triton.next_power_of_2(tokens_per_expt), 128))
160+
min_block_m = 64 if torch.cuda.get_device_capability()[0] == 10 else 16
161+
block_m = max(min_block_m, min(triton.next_power_of_2(tokens_per_expt), 128))
161162
# block n
162163
arch = None
163164
block_n = opt_flags_nvidia.compute_block_n(n, arch, precision_config)

0 commit comments

Comments
 (0)