Skip to content

Commit 6a9a0a6

Browse files
feat: add return all for do_bench (intel#4493)
Since H100s have a power throttling depending on the kernel, it is important to see how the TFLOPs change over time. I have this patch in my internal codebase and found it useful to see the cyclic patterns of different kernels and see how long it takes before reaching a steady state. ![image](https://github.com/user-attachments/assets/ff77edea-8f61-446a-8afe-023c25933fe9) Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [x ] I am not making a trivial change, such as fixing a typo in a comment. - [ x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x ] This PR does not need a test because do_bench does not have unit tests LOL. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)
1 parent 23bf0e0 commit 6a9a0a6

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

python/triton/testing.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ def _summarize_statistics(times, quantiles, return_mode):
2323
if len(ret) == 1:
2424
ret = ret[0]
2525
return ret
26+
if return_mode == "all":
27+
return times.tolist()
2628
return getattr(torch, return_mode)(times).item()
2729

2830

@@ -36,11 +38,11 @@ def do_bench_cudagraph(fn, rep=20, grad_to_none=None, quantiles=None, return_mod
3638
:type rep: int
3739
:param grad_to_none: Reset the gradient of the provided tensor to None
3840
:type grad_to_none: torch.tensor, optional
39-
:param return_mode: The statistical measure to return. Options are "min", "max", "mean", or "median". Default is "mean".
41+
:param return_mode: The statistical measure to return. Options are "min", "max", "mean", "median", or "all" Default is "mean".
4042
:type return_mode: str
4143
"""
4244
import torch
43-
assert return_mode in ["min", "max", "mean", "median"]
45+
assert return_mode in ["min", "max", "mean", "median", "all"]
4446

4547
with torch.cuda.stream(torch.cuda.Stream()):
4648
# warmup
@@ -107,10 +109,9 @@ def do_bench(fn, warmup=25, rep=100, grad_to_none=None, quantiles=None, fast_flu
107109
:type quantiles: list[float], optional
108110
:param fast_flush: Use faster kernel to flush L2 cache between measurements
109111
:type fast_flush: bool, default is True
110-
:param return_mode: The statistical measure to return. Options are "min", "max", "mean", or "median". Default is "mean".
111-
:type return_mode: str
112+
:param return_mode: The statistical measure to return. Options are "min", "max", "mean", "median", or "all" Default is "mean". :type return_mode: str
112113
"""
113-
assert return_mode in ["min", "max", "mean", "median"]
114+
assert return_mode in ["min", "max", "mean", "median", "all"]
114115
import torch
115116

116117
di = torch._dynamo.device_interface.get_interface_for_device(device_type)

0 commit comments

Comments
 (0)