Skip to content

Conversation

@anmyachev
Copy link
Contributor

@anmyachev anmyachev commented Oct 16, 2024

Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
@anmyachev anmyachev marked this pull request as ready for review October 16, 2024 13:21
# benchmark_suit.assert_close(xetla_fn(), torch_fn(), atol=1e-4, rtol=1.0, err_msg='xetla to torch')
_, min_ms, max_ms, mean_ms, cv = benchmark_suit.do_bench(
xetla_fn, n_warmup=10, n_repeat=10, quantiles=quantiles,
kernel_name='gpu::xetla::kernel::gemm_universal_t<dispatch_stream_k')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an incorrect kernel name. The error was not visible because at the time of adding, the benchmark was not running in CI.

assert len(functions) == n_repeat, f"the profiling number not match, {len(functions)}"
# Make the time to the milliseconds.
times = torch.tensor([f.self_device_time_total * 1e-3 for f in functions], dtype=torch.float)
times = torch.tensor([sum(map(lambda elem: elem.self_device_time_total, f)) * 1e-3 for f in zip(*all_functions)],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem was that the time of several kernels was not summed up. This affects only "gemm streamk" benchmark.

@anmyachev anmyachev merged commit 700abe3 into main Oct 17, 2024
6 checks passed
@anmyachev anmyachev deleted the amyachev/several-kernels branch October 17, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Profiling] Enhancements to the do_bench(...) kineto implementation

3 participants