Skip to content

Commit fbe1530

Browse files
jataylojeffdaily
authored andcommitted
Bug fix and optimisation for persistent reduction kernel tuning (#2596)
Original PR (#2417) had incorrect indentation. Updated PR such that autotune will always add tiny configs, otherwise use the hinted configs only. Tested locally on test_torchinductor: Ran 894 tests in 952.242s FAILED (failures=1, skipped=28) And completed autotune runs for microbench models Microbenchmark for network : resnet152 Num devices: 1 Dtype: FP32 Mini batch size [img] : 64 Time per mini-batch : 0.09107530117034912 Throughput [img/sec] : 702.7152167226226 (cherry picked from commit db3ba66)
1 parent 345bdb5 commit fbe1530

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

torch/_inductor/runtime/triton_heuristics.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2909,20 +2909,20 @@ def _persistent_reduction_configs(
29092909
elif reduction_hint == ReductionHint.OUTER:
29102910
configs = configs[-1:]
29112911

2912-
if reduction_hint == ReductionHint.OUTER_TINY:
2913-
tiny_configs = [
2914-
triton_config_reduction(
2915-
size_hints,
2916-
2 * (256 // rnumel) if rnumel <= 256 else 1,
2917-
rnumel,
2918-
)
2919-
]
2920-
if max_autotune_enabled:
2921-
for tconfig in tiny_configs:
2922-
if tconfig not in configs:
2923-
configs.append(tconfig)
2924-
else:
2925-
configs = tiny_configs
2912+
tiny_configs = [
2913+
triton_config_reduction(
2914+
size_hints,
2915+
2 * (256 // rnumel) if rnumel <= 256 else 1,
2916+
rnumel,
2917+
)
2918+
]
2919+
2920+
if max_autotune_enabled:
2921+
for conf in tiny_configs:
2922+
if conf not in configs:
2923+
configs.append(conf)
2924+
elif reduction_hint == ReductionHint.OUTER_TINY:
2925+
configs = tiny_configs
29262926

29272927
for c in configs:
29282928
# we don't need Rn_BLOCK for persistent reduction

0 commit comments

Comments
 (0)