-
Notifications
You must be signed in to change notification settings - Fork 70
[CI] Better warmup for flex attention on B580 #4906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
70eea16
604b350
e68f0d9
cb2fc0c
7841fb4
f5a87ea
beb6042
e5b98d5
f635a28
754ab4b
9992205
5adc66f
e74bdd2
9e5f1e0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -165,8 +165,13 @@ def benchmark(Z, H_q, H_kv, N_CTX_q, N_CTX_kv, D_HEAD_qk, D_HEAD_v, MODE, provid | |
triton_fn = lambda: triton_o.backward(triton_do, retain_graph=True) | ||
|
||
benchmark_suit.assert_close(triton_fn, torch_fn, atol=1e-2, rtol=1e-3, err_msg='triton to torch') | ||
_, min_ms, max_ms, mean, cv = benchmark_suit.do_bench(triton_fn, n_warmup=10, n_repeat=10, quantiles=quantiles, | ||
device=DEVICE) | ||
# Need more warmups on B580 due to the torch.compile | ||
|
||
is_bmg = any(name in torch.xpu.get_device_name().lower() for name in ('b570', 'b580')) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can keep it simple and increase across platforms, no need to check if it is bmg. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
if is_bmg: | ||
benchmark_suit.do_prewarmup(triton_fn) | ||
_, min_ms, max_ms, mean, cv = benchmark_suit.do_bench(triton_fn, n_warmup=200 if is_bmg else 10, n_repeat=10, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need both prewarmup and increasing n_warmup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think so, first warmup across all shapes takes a lot of time. Just setting n_warmup to 200 is not always enough |
||
quantiles=quantiles, device=DEVICE) | ||
|
||
elif provider == 'onednn': | ||
# OneDNN only supports MHA. | ||
|
Uh oh!
There was an error while loading. Please reload this page.