Validate exhaustive autotuning for FP8 Inductor templates

jananisriram · facebook-github-bot · commit 2bd053d49c40 · 2025-08-25T15:06:31.000-07:00
Summary: X-link: pytorch/pytorch#161442 Validate exhaustive autotuning for FP8 Inductor templates: scaled MM templates require `block_k >= 32`. Before, exhaustive autotuning defaulted to a limited set of autotuning configs, as limitations for exhaustively autotuning on FP8 shapes had not been tested. Differential Revision: D80958642
diff --git a/tritonbench/operators/fp8_gemm/fp8_gemm.py b/tritonbench/operators/fp8_gemm/fp8_gemm.py
@@ -17,6 +17,8 @@
 
 from .tutorial import matmul as tutorial_matmul
 
+torch._dynamo.config.recompile_limit = 10000  # Set high recompile limit to allow for exhausting autotuning
+
 logger = logging.getLogger(__name__)
 try:
     from .persistent import (