[SGLANG] [Benchmarks] Initial integration of sglang kernels to benchmarks by chengjunlu · Pull Request #6789 · intel/intel-xpu-backend-for-triton

chengjunlu · 2026-04-30T01:31:26Z

This PR continue the work in #3796.

The initial enabling for sglang benchmarks.
Include sglang prefill/decode/extended attention and fp8 quant gemm into third-party benchmark.

Copilot

Pull request overview

Integrates SGLang Triton kernels into the repo’s benchmark harness and wires them into the “third party benchmarks” GitHub Actions workflow so their performance can be captured and reported alongside existing benchmarks.

Changes:

Add new benchmark entrypoints for SGLang attention (prefill/decode/extended) under benchmarks/triton_kernels_benchmark/.
Add a standalone Triton FP8 block GEMM benchmark derived from SGLang’s FP8 kernel.
Extend .github/workflows/third-party-benchmarks.yml to install SGLang and run the new benchmarks, producing CSV reports.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
benchmarks/triton_kernels_benchmark/prefill_attention_benchmark.py	Adds a prefill attention benchmark driver for SGLang’s `context_attention_fwd`.
benchmarks/triton_kernels_benchmark/decode_attention_benchmark.py	Adds a decode attention benchmark driver for SGLang’s `decode_attention_fwd`.
benchmarks/triton_kernels_benchmark/extended_attention_benchmark.py	Adds an “extended/append” attention benchmark driver for SGLang’s `extend_attention_fwd`.
benchmarks/triton_kernels_benchmark/block_fp8_gemm_benchmark.py	Adds a Triton FP8 block GEMM benchmark (and a native Torch reference) for correctness/perf.
.github/workflows/third-party-benchmarks.yml	Installs dependencies (PTI), installs SGLang, and runs the new benchmarks + report generation.

Copilot · 2026-04-30T01:37:34Z

+        As: The per-token-group quantization scale for `A`.
+        Bs: The per-block quantization scale for `B`.
+        block_size: The block size for per-block quantization. It should be 2-dim, e.g., [128, 128].
+        output_dytpe: The dtype of the returned tensor.


Docstring: output_dytpe is misspelled and doesn’t match the actual argument name output_dtype, which can confuse users of this helper.

Suggested change

output_dytpe: The dtype of the returned tensor.

output_dtype: The dtype of the returned tensor.

Copilot · 2026-04-30T01:37:35Z

+    C = A.new_empty(C_shape, dtype=output_dtype)
+
+    # Default config
+    # Block-wise quant: BLOCK_SIZE_K must be divisable by block_size[1]


Comment typo: “divisable” should be “divisible”.

Suggested change

# Block-wise quant: BLOCK_SIZE_K must be divisable by block_size[1]

# Block-wise quant: BLOCK_SIZE_K must be divisible by block_size[1]

Copilot · 2026-04-30T01:37:35Z

+        run: |
+          git clone https://github.com/sgl-project/sglang.git
+          cd sglang
+          git apply ../benchmarks/third_party/sglang/sglang-fix.patch


This step applies ../benchmarks/third_party/sglang/sglang-fix.patch, but that file doesn’t exist in benchmarks/third_party/sglang/ (only sglang-bench-fix.patch / sglang-test-fix.patch are present). CI will fail at git apply; update the filename(s) or add the missing patch.

Suggested change

git apply ../benchmarks/third_party/sglang/sglang-fix.patch

git apply ../benchmarks/third_party/sglang/sglang-bench-fix.patch

Copilot · 2026-04-30T01:37:35Z

+
+          source ../../../scripts/capture-hw-details.sh
+          python ../../triton_kernels_benchmark/build_report.py $REPORTS/sglang-extended-attn-performance.csv $REPORTS/sglang-append-attn-triton-report.csv --benchmark sglang-extended-attn --compiler triton --param_cols "B,Q_LEN,PREFIX_LEN,KV_LEN,H_Q,H_KV,D" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG


Same incorrect build_report.py path as above; additionally, --param_cols "B,Q_LEN,PREFIX_LEN,KV_LEN,H_Q,H_KV,D" doesn’t match the columns produced by extended_attention_benchmark.py (it uses EXTEND_LEN/PREFIX_LEN and has no Q_LEN/KV_LEN). build_report.py will raise due to missing columns—update --param_cols to match the benchmark’s CSV headers.

Copilot · 2026-04-30T01:37:35Z

+          source ../../../scripts/capture-hw-details.sh
+          python ../../triton_kernels_benchmark/build_report.py $REPORTS/sglang-fp8-gemm-performance.csv $REPORTS/sglang-fp8-gemm-triton-report.csv --benchmark sglang-block-fp8-gemm --compiler triton --param_cols "M,N,K" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG


Same incorrect build_report.py path as above (this will fail from benchmarks/triton_kernels_benchmark).

Copilot · 2026-04-30T01:37:37Z

+          source ../../../scripts/capture-hw-details.sh
+          python ../../triton_kernels_benchmark/build_report.py $REPORTS/sglang-decode-attn-performance.csv $REPORTS/sglang-decode-attn-triton-report.csv --benchmark sglang-decode-attn --compiler triton --param_cols "B,SEQ_LENS,H_Q,H_KV,D" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG
+


Same issue here: from benchmarks/triton_kernels_benchmark, ../../triton_kernels_benchmark/build_report.py resolves to a non-existent path and will break report generation.

Copilot · 2026-04-30T01:37:37Z

+import triton_kernels_benchmark as benchmark_suit
+


Minor naming consistency: this file uses benchmark_suit as the alias for triton_kernels_benchmark, while the rest of the benchmarks typically use benchmark_suite. Consider renaming to match the established alias.

Suggested change

import triton_kernels_benchmark as benchmark_suit

import triton_kernels_benchmark as benchmark_suite

benchmark_suit = benchmark_suite

Copilot · 2026-04-30T01:37:37Z

+        raise NotImplementedError(f"Unsupported provider {provider}")
+
+    tflops = lambda ms: 2 * M * N * K * (1e-12) / (ms * 1e-3)
+    gbps = lambda ms: (M * K + K * N) + 2.0 * (M * N) * (1e-9) / (ms * 1e-3)


The GB/s calculation is missing parentheses/time normalization: (M*K + K*N) is being added as a raw element count instead of being converted to GB and divided by runtime. This will produce incorrect bandwidth numbers in the report.

Suggested change

gbps = lambda ms: (M * K + K * N) + 2.0 * (M * N) * (1e-9) / (ms * 1e-3)

gbps = lambda ms: ((M * K + K * N) + 2.0 * (M * N)) * (1e-9) / (ms * 1e-3)

Copilot · 2026-04-30T01:37:37Z

+          source ../../../scripts/capture-hw-details.sh
+          python ../../triton_kernels_benchmark/build_report.py $REPORTS/sglang-prefill-attn-performance.csv $REPORTS/sglang-prefill-attn-triton-report.csv --benchmark sglang-prefill-attn --compiler triton --param_cols "B,SEQ_LENS,H_Q,H_KV,D,CAUSAL" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG
+


build_report.py is in benchmarks/triton_kernels_benchmark/. From this step’s working directory (benchmarks/triton_kernels_benchmark), the path ../../triton_kernels_benchmark/build_report.py points to a non-existent top-level triton_kernels_benchmark/ directory and will fail. Use python build_report.py ... (or the correct relative path) instead.

Copilot · 2026-04-30T01:37:38Z

+import torch
+from sglang.srt.layers.attention.triton_ops.extend_attention import (
+    extend_attention_fwd, )
+import triton_kernels_benchmark as benchmark_suit


Minor naming consistency: other benchmarks typically use benchmark_suite as the alias for triton_kernels_benchmark, but this file uses benchmark_suit (missing “e”). Renaming would align with the rest of the repo.

Suggested change

import triton_kernels_benchmark as benchmark_suit

import triton_kernels_benchmark as benchmark_suite

benchmark_suit = benchmark_suite

Port prefill attn and decode attn from sglang Add validation temp add extend attention disable debug ir dump Update three stage attention benchmark Add sglang kernel benchmark to action use 1e-3 atol remove sglang benchmark from triton-benchmarks Fix setup bdist_wheel Add sglang to thirdparty test Address review comments Remove sglang from tests Fix CI Address review comments Integrate sglang prefill/decode/extend kernel to benchmarks Port prefill attn and decode attn from sglang Add validation temp add extend attention disable debug ir dump Update three stage attention benchmark Add sglang kernel benchmark to action use 1e-3 atol remove sglang benchmark from triton-benchmarks Fix setup bdist_wheel Add sglang to thirdparty test Address review comments Remove sglang from tests Adjust params term Adjust tflops computation

fix bugs rtol atol Move fp8 gemm to sglang benchmark

Address review comments Fix CI XPU not found

etiotto · 2026-04-30T14:44:39Z

+    # o will have the same shape as q
+    o = torch.zeros(B, H_Q, D, dtype=dtype, device=device)
+
+    b_seq_len = torch.full((B, ), N_CTX, device=device)


b_seq_len should be dtype=torch.int32 explicitly — SGLang's decode_attention_fwd expects int32 and the later cumsum result gets silently cast into the int32 kv_indptr.

etiotto · 2026-04-30T14:44:39Z

+    quantiles = [0.5, 0.0, 1.0]
+    if provider == 'triton' and MODE == 'fwd':
+        triton_fn = lambda: context_attention_fwd(q, k, v, o, b_start_loc, b_seq_len, max_seq_len, is_causal=CAUSAL)
+        _, min_ms, max_ms, mean_ms, cv = benchmark_suit.do_bench(triton_fn, n_warmup=10, n_repeat=10,


No numerical validation against a torch reference before timing — unlike block_fp8_gemm_benchmark.py, the three attention benchmarks (prefill/decode/extended) can silently report plausible numbers on a broken kernel.

chengjunlu requested review from anmyachev, Copilot, dev-tomek, etiotto, vlad-penkin and whitneywhtsang April 30, 2026 01:31

Copilot started reviewing on behalf of chengjunlu April 30, 2026 01:32 View session

chengjunlu force-pushed the chengjun/init_sglang_benchmark branch from d3de9c0 to 69b7afe Compare April 30, 2026 01:33

whitneywhtsang requested a review from quinnlp April 30, 2026 01:33

Copilot AI reviewed Apr 30, 2026

View reviewed changes

chengjunlu mentioned this pull request Apr 30, 2026

[SGLANG] add sglang block fp8 gemm kernels into benchmark #3676

Open

chengjunlu linked an issue Apr 30, 2026 that may be closed by this pull request

[SGLANG] add sglang block fp8 gemm kernels into benchmark #3676

Open

leonling-ll and others added 5 commits April 30, 2026 09:20

add sglang block fp8 gemm into benchmark

410fc4e

fix bugs rtol atol Move fp8 gemm to sglang benchmark

Update extended attention interface

7841ea5

Address review comments Fix CI XPU not found

Address review comments

a51b49b

Move sglang benchmarks to triton_kernels_benchmark folder

69b7afe

etiotto reviewed Apr 30, 2026

View reviewed changes

	output_dytpe: The dtype of the returned tensor.
	output_dtype: The dtype of the returned tensor.

	# Block-wise quant: BLOCK_SIZE_K must be divisable by block_size[1]
	# Block-wise quant: BLOCK_SIZE_K must be divisible by block_size[1]

	git apply ../benchmarks/third_party/sglang/sglang-fix.patch
	git apply ../benchmarks/third_party/sglang/sglang-bench-fix.patch


		source ../../../scripts/capture-hw-details.sh
		python ../../triton_kernels_benchmark/build_report.py $REPORTS/sglang-extended-attn-performance.csv $REPORTS/sglang-append-attn-triton-report.csv --benchmark sglang-extended-attn --compiler triton --param_cols "B,Q_LEN,PREFIX_LEN,KV_LEN,H_Q,H_KV,D" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG

-import triton_kernels_benchmark as benchmark_suit
+import triton_kernels_benchmark as benchmark_suite
+benchmark_suit = benchmark_suite

	gbps = lambda ms: (M * K + K * N) + 2.0 * (M * N) * (1e-9) / (ms * 1e-3)
	gbps = lambda ms: ((M * K + K * N) + 2.0 * (M * N)) * (1e-9) / (ms * 1e-3)

	import triton_kernels_benchmark as benchmark_suit
	import triton_kernels_benchmark as benchmark_suite
	benchmark_suit = benchmark_suite

Conversation

chengjunlu commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

etiotto Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

etiotto Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants