Skip to content

Commit 7814de9

Browse files
authored
Fix Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure BMG (#5595)
Addresses [#5481](#5481). Fixes `RuntimeError: Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)` on Triton GEMM + PostOp (add matrix) kernel benchmark int8 BMG. The memory reservation would rise between each configuration ran within the benchmark, finally resulting in oom under the hood. The issue visible on only a single bmg runner due to more system RAM on that runner, which makes it pass runtime checks and run an additional test case. [Passing ](https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/19854824092/job/56890032648) Triton GEMM + PostOp (add matrix) kernel benchmark int8 BMG on that runner. Similar error message is visible also on FlexAttention (batch_size=16) Causal Mask fwd, however the same fix does not apply indicating a different issue. This will be continued here: #5603.
1 parent 431cc25 commit 7814de9

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

benchmarks/triton_kernels_benchmark/benchmark_testing.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,9 @@ def extract_kernels(funcs):
308308
raise NotImplementedError(f"BENCHMARKING_METHOD: {BENCHMARKING_METHOD} isn't implemented")
309309

310310

311-
def get_do_bench(n_warmup: int, n_repeat: int, quantiles: list):
311+
def get_do_bench(n_warmup: int, n_repeat: int, quantiles: list, clear_cache: bool = True):
312+
if clear_cache:
313+
torch.xpu.empty_cache()
312314
return functools.partial(do_bench, n_warmup=n_warmup, n_repeat=n_repeat, quantiles=quantiles)
313315

314316

0 commit comments

Comments
 (0)