Fix Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure BMG (#5595)

dev-tomek · web-flow · commit 7814de92037c · 2025-12-09T09:55:35.000+01:00
Addresses [#5481](#5481). Fixes `RuntimeError: Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)` on Triton GEMM + PostOp (add matrix) kernel benchmark int8 BMG. The memory reservation would rise between each configuration ran within the benchmark, finally resulting in oom under the hood. The issue visible on only a single bmg runner due to more system RAM on that runner, which makes it pass runtime checks and run an additional test case. [Passing ](https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/19854824092/job/56890032648) Triton GEMM + PostOp (add matrix) kernel benchmark int8 BMG on that runner. Similar error message is visible also on FlexAttention (batch_size=16) Causal Mask fwd, however the same fix does not apply indicating a different issue. This will be continued here: #5603.
diff --git a/benchmarks/triton_kernels_benchmark/benchmark_testing.py b/benchmarks/triton_kernels_benchmark/benchmark_testing.py
@@ -308,7 +308,9 @@ def extract_kernels(funcs):
     raise NotImplementedError(f"BENCHMARKING_METHOD: {BENCHMARKING_METHOD} isn't implemented")
 
 
-def get_do_bench(n_warmup: int, n_repeat: int, quantiles: list):
+def get_do_bench(n_warmup: int, n_repeat: int, quantiles: list, clear_cache: bool = True):
+    if clear_cache:
+        torch.xpu.empty_cache()
     return functools.partial(do_bench, n_warmup=n_warmup, n_repeat=n_repeat, quantiles=quantiles)