[CI] Better warmup for flex attention on B580 #4906

Egor-Krivov · 2025-08-15T08:35:13Z

Flex attention requires more warmup steps on B580.

PR adds:

Pre-warmup step for flex attention that is called once per run, so it will only run for the first shape config. Experiments show that first config requires more warmup
Makes GPU synch consistent between warmup and benchmarking
Adds iterations

Should resolve #4852

Better warmup should be done after researching in #4911

Egor-Krivov · 2025-08-18T10:56:20Z

Test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17041292953

whitneywhtsang · 2025-08-18T13:18:05Z

Also, based on experiments, warmup has to contain profiling, otherwise first iterations with profiling will be slower.

That's interesting, do you have some ideas on why?

Egor-Krivov · 2025-08-18T17:32:02Z

New test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17047856215

Egor-Krivov · 2025-08-18T18:12:01Z

Also, based on experiments, warmup has to contain profiling, otherwise first iterations with profiling will be slower.

That's interesting, do you have some ideas on why?

I've rechecked just in case, and I discovered that having profiling during warmup is not required. However, warmup combined with profiling means that I need 20-80 warmup steps in 98% of cases. If warmup is without profiling, I need about 150-200 warmup steps. Originally I only tried warmup steps up to 100.

Maybe some optimizations are tied to the total time spent in this kernel (either to calculate optimizations or to check if it's worth it), and having warmup + profiling + cache rewrite means that we simply have more time for warmup.

Egor-Krivov · 2025-08-18T18:33:15Z

New test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17047856215

Egor-Krivov · 2025-08-19T10:33:02Z

Test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17066951767

Egor-Krivov · 2025-08-20T09:56:19Z

Looks like flex attention results are stable now:
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17073456205
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17066951767

I'll investigate proper warmup for other kernels in another issue #4911

@whitneywhtsang Will you take a look?

whitneywhtsang · 2025-08-20T14:27:24Z

benchmarks/triton_kernels_benchmark/flex_attention_benchmark_causal_mask.py

+        is_bmg = any(name in torch.xpu.get_device_name().lower() for name in ('b570', 'b580'))
+        if is_bmg:
+            benchmark_suit.do_prewarmup(triton_fn)
+        _, min_ms, max_ms, mean, cv = benchmark_suit.do_bench(triton_fn, n_warmup=200 if is_bmg else 10, n_repeat=10,


Do we need both prewarmup and increasing n_warmup?

I think so, first warmup across all shapes takes a lot of time. Just setting n_warmup to 200 is not always enough

whitneywhtsang · 2025-08-20T14:28:12Z

benchmarks/triton_kernels_benchmark/flex_attention_benchmark_causal_mask.py

-                                                              device=DEVICE)
+        # Need more warmups on B580 due to the torch.compile
+
+        is_bmg = any(name in torch.xpu.get_device_name().lower() for name in ('b570', 'b580'))


I think we can keep it simple and increase across platforms, no need to check if it is bmg.

benchmarks/triton_kernels_benchmark/benchmark_testing.py

Egor-Krivov · 2025-08-20T16:56:15Z

Test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17105024376

whitneywhtsang

Please also change PR description, it is now out of date.

We agree to land this as is to unblock flex attn BMG measurement, and further improve the warmup mechanism in a separate PR.

benchmarks/triton_kernels_benchmark/benchmark_testing.py

Egor-Krivov added 2 commits August 14, 2025 17:15

Better warmup

70eea16

Updated warmup schedule

604b350

Egor-Krivov mentioned this pull request Aug 15, 2025

[FlexAttention] Investigate performance fluctuation on BMG #4852

Closed

Cleaned up

e68f0d9

Egor-Krivov requested a review from whitneywhtsang August 18, 2025 10:57

Egor-Krivov marked this pull request as ready for review August 18, 2025 11:09

Egor-Krivov added 2 commits August 18, 2025 17:14

reverted profiling for warmup

cb2fc0c

even more warmup

7841fb4

Egor-Krivov and others added 5 commits August 19, 2025 11:22

Merge branch 'main' into egor/flex_attn_fluc

f5a87ea

Pre-warmup approach

beb6042

Pre-warmup approach

e5b98d5

Combining warmups

f635a28

Cleaned up

754ab4b

whitneywhtsang reviewed Aug 20, 2025

View reviewed changes

Egor-Krivov added 3 commits August 20, 2025 16:49

reverted warmup with sync

9992205

removed bmg special settings

5adc66f

Better comments

e74bdd2

Using sync_submitting now

9e5f1e0

whitneywhtsang approved these changes Aug 21, 2025

View reviewed changes

benchmarks/triton_kernels_benchmark/benchmark_testing.py Show resolved Hide resolved

Egor-Krivov merged commit 3714e9b into main Aug 22, 2025
16 of 18 checks passed

Egor-Krivov deleted the egor/flex_attn_fluc branch August 22, 2025 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Better warmup for flex attention on B580 #4906

[CI] Better warmup for flex attention on B580 #4906

Uh oh!

Egor-Krivov commented Aug 15, 2025 •

edited

Loading

Uh oh!

Egor-Krivov commented Aug 18, 2025 •

edited

Loading

Uh oh!

whitneywhtsang commented Aug 18, 2025

Uh oh!

Egor-Krivov commented Aug 18, 2025

Uh oh!

Egor-Krivov commented Aug 18, 2025

Uh oh!

Egor-Krivov commented Aug 18, 2025

Uh oh!

Egor-Krivov commented Aug 19, 2025

Uh oh!

Egor-Krivov commented Aug 20, 2025

Uh oh!

whitneywhtsang Aug 20, 2025

Uh oh!

Egor-Krivov Aug 20, 2025

Uh oh!

whitneywhtsang Aug 20, 2025

Uh oh!

Egor-Krivov Aug 21, 2025

Uh oh!

Uh oh!

Egor-Krivov commented Aug 20, 2025

Uh oh!

whitneywhtsang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CI] Better warmup for flex attention on B580 #4906

[CI] Better warmup for flex attention on B580 #4906

Uh oh!

Conversation

Egor-Krivov commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Egor-Krivov commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whitneywhtsang commented Aug 18, 2025

Uh oh!

Egor-Krivov commented Aug 18, 2025

Uh oh!

Egor-Krivov commented Aug 18, 2025

Uh oh!

Egor-Krivov commented Aug 18, 2025

Uh oh!

Egor-Krivov commented Aug 19, 2025

Uh oh!

Egor-Krivov commented Aug 20, 2025

Uh oh!

whitneywhtsang Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Egor-Krivov Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

whitneywhtsang Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Egor-Krivov Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Egor-Krivov commented Aug 20, 2025

Uh oh!

whitneywhtsang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Egor-Krivov commented Aug 15, 2025 •

edited

Loading

Egor-Krivov commented Aug 18, 2025 •

edited

Loading