Skip to content

[BUG FIX] Improve error reporting and occupancy in benchmarks#251

Merged
LoserCheems merged 2 commits intomainfrom
optim_triton_version
Mar 18, 2026
Merged

[BUG FIX] Improve error reporting and occupancy in benchmarks#251
LoserCheems merged 2 commits intomainfrom
optim_triton_version

Conversation

@LoserCheems
Copy link
Collaborator

Summary

  • This update enhances error reporting in benchmark tests and improves launch configuration logic for better occupancy in forward and sparse kernels.

Root Cause

  • The previous error handling did not provide sufficient detail, making debugging difficult. Additionally, the occupancy settings for certain configurations were suboptimal.

Changes

  • Added detailed error reporting with traceback in benchmark tests.
  • Updated launch configuration logic to adjust tile sizes based on qheads_per_kvhead for improved occupancy.

Reproduction

  • Run benchmark tests with various configurations to observe error messages and occupancy metrics.

Tests

  • Validated changes by running existing benchmark tests and confirming improved error output and occupancy metrics.

Compatibility

  • No backward compatibility issues identified.

Checklist

  • Linked issue provided
  • Adds or updates tests
  • Updates docs if needed
  • No perf regressions

Copilot AI review requested due to automatic review settings March 18, 2026 17:26
@LoserCheems LoserCheems merged commit 26fe0af into main Mar 18, 2026
3 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves benchmark failure diagnostics by including full Python tracebacks in error messages, and updates Triton forward-kernel launch configuration heuristics intended to improve occupancy (especially for split-KV / decode-like cases).

Changes:

  • Include traceback details in BenchmarkResult.error_message and print multi-line failure output in benchmark scripts.
  • Adjust split-KV TILE_M selection based on qheads_per_kvhead and retune some SM90 (H100) forward launch configs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
tests/benchmark_forward.py Adds traceback capture to benchmark failures and prints full error details.
tests/benchmark_decode.py Same traceback-enhanced error reporting for decode benchmarks.
tests/benchmark_backward.py Same traceback-enhanced error reporting for backward benchmarks.
flash_sparse_attn/ops/triton/launch_template.py Updates forward (dense/sparse/gated) launch config heuristics, including split-KV TILE_M and SM90 configs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 32 to +36
if is_split_kv:
if pack_gqa and qheads_per_kvhead > 1:
if pack_gqa and qheads_per_kvhead > 16:
tile_m = triton.next_power_of_2(qheads_per_kvhead)
else:
tile_m = 1
tile_m = 16
tile_m = triton.next_power_of_2(qheads_per_kvhead)
else:
tile_m = 1
tile_m = 16
Comment on lines 254 to +258
if is_split_kv:
if pack_gqa and qheads_per_kvhead > 1:
if pack_gqa and qheads_per_kvhead > 16:
tile_m = triton.next_power_of_2(qheads_per_kvhead)
else:
tile_m = 1
tile_m = 16
Comment on lines 215 to +229
@@ -224,7 +226,7 @@ def run_benchmark(cfg: BenchmarkConfig) -> BenchmarkResult:
triton_gated_tflops=None,
fa_dense_tflops=None,
cudnn_dense_tflops=None,
error_message=str(exc),
error_message=full_error,
Comment on lines 212 to +226
@@ -221,7 +223,7 @@ def run_benchmark(cfg: BenchmarkConfig) -> BenchmarkResult:
triton_gated_tflops=None,
fa_dense_tflops=None,
cudnn_dense_tflops=None,
error_message=str(exc),
error_message=full_error,
Comment on lines 272 to 287
@@ -281,7 +283,7 @@ def run_benchmark(cfg: BenchmarkConfig) -> BenchmarkResult:
triton_gated_tflops=None,
fa_dense_tflops=None,
cudnn_dense_tflops=None,
error_message=str(exc),
error_message=full_error,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants