Fix Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure BMG #5595

dev-tomek · 2025-12-02T10:06:42Z

Addresses #5481.
Fixes RuntimeError: Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST) on Triton GEMM + PostOp (add matrix) kernel benchmark int8 BMG.

The memory reservation would rise between each configuration ran within the benchmark, finally resulting in oom under the hood.
The issue visible on only a single bmg runner due to more system RAM on that runner, which makes it pass runtime checks and run an additional test case.

Passing Triton GEMM + PostOp (add matrix) kernel benchmark int8 BMG on that runner.

Similar error message is visible also on FlexAttention (batch_size=16) Causal Mask fwd, however the same fix does not apply indicating a different issue. This will be continued here: #5603.

This reverts commit 19e276e.

etiotto · 2025-12-04T14:31:15Z

benchmarks/triton_kernels_benchmark/gemm_postop_addmatrix_benchmark.py

    # Maximum across onednn=600, triton=1000
    # For onednn and triton: Some configs increase performance with warmup as a step function, but some
    # slowly decrease with saturation. Performance is best at 150-200ms range, but we want stable, not just best
+    torch.xpu.empty_cache()


what about the other benchmarks we run, shouldn't we do the same ?

Should we do it in a common place, maybe get_empty_cache_for_benchmark?

empty cache between each run to avoid OOM

fca27a5

dev-tomek linked an issue Dec 2, 2025 that may be closed by this pull request

[BMG] Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure #5481

Open

dev-tomek added 3 commits December 3, 2025 08:46

add cache flush to flex att bs=16

19e276e

Revert "add cache flush to flex att bs=16"

facac0f

This reverts commit 19e276e.

Merge branch 'main' into tkuczynski/fix_benchmarks_bmg

279573d

dev-tomek marked this pull request as ready for review December 4, 2025 13:53

dev-tomek changed the title ~~empty cache between each run to avoid OOM~~ Fix Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure BMG Dec 4, 2025

dev-tomek requested review from anmyachev and whitneywhtsang December 4, 2025 14:00

anmyachev approved these changes Dec 4, 2025

View reviewed changes

etiotto reviewed Dec 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure BMG #5595

Fix Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure BMG #5595

dev-tomek commented Dec 2, 2025 •

edited

Loading

Uh oh!

etiotto Dec 4, 2025

Uh oh!

whitneywhtsang Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure BMG #5595

Are you sure you want to change the base?

Fix Triton GEMM + PostOp (add matrix) kernel benchmark int8 failure BMG #5595

Conversation

dev-tomek commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etiotto Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

whitneywhtsang Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dev-tomek commented Dec 2, 2025 •

edited

Loading