Validate exhaustive autotuning for FP8 Inductor templates (#355)

jananisriram · facebook-github-bot · commit 83d742c1fb61 · 2025-08-27T13:51:10.000-07:00
Summary: X-link: pytorch/pytorch#161442 Validate exhaustive autotuning for FP8 Inductor templates: scaled MM templates require `block_k >= 32`. Before, exhaustive autotuning defaulted to a limited set of autotuning configs, as limitations for exhaustively autotuning on FP8 shapes had not been tested. Reviewed By: coconutruben Differential Revision: D80958642
diff --git a/tritonbench/operators/fp8_gemm/fp8_gemm.py b/tritonbench/operators/fp8_gemm/fp8_gemm.py
@@ -17,6 +17,10 @@
 
 from .tutorial import matmul as tutorial_matmul
 
+torch._dynamo.config.recompile_limit = (
+    10000  # Set high recompile limit to allow for exhausting autotuning
+)
+
 logger = logging.getLogger(__name__)
 try:
     from .persistent import (