Skip to content

Commit 2f62643

Browse files
authored
benchmark: Adding FP8 benchmark on attention and matmul testing (#1390)
<!-- .github/pull_request_template.md --> ## 📌 Description <!-- What does this PR do? Briefly describe the changes and why they’re needed. --> Current PR extends benchmarking script in `flashinfer_benchmark.py` by adding: * MLA backend testing * Attention FP8 Benchmarking Introduction for prefill, decode, and MLA. * bmm_fp8 and mm_fp4 testing. * Addition of a few backends where applicable (e.g. trtllm-gen for paged prefill) * Fix broken cuDNN prefill & decode benchmarking due to migration from cubins to full cuDNN integration * General minor benchmark code refactoring to reduce code redundancy. ## 🔍 Related Issues <!-- Link any related issues here --> ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. -->
1 parent 4c4276a commit 2f62643

File tree

9 files changed

+1594
-744
lines changed

9 files changed

+1594
-744
lines changed

benchmarks/README.md

Lines changed: 47 additions & 38 deletions
Large diffs are not rendered by default.

benchmarks/flashinfer_benchmark.py

Lines changed: 10 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,11 @@
22
import sys
33

44
from routines.attention import parse_attention_args, run_attention_test
5-
from routines.flashinfer_benchmark_utils import full_output_columns, output_column_dict
5+
from routines.flashinfer_benchmark_utils import (
6+
benchmark_apis,
7+
full_output_columns,
8+
output_column_dict,
9+
)
610
from routines.gemm import parse_gemm_args, run_gemm_test
711

812

@@ -15,16 +19,9 @@ def run_test(args):
1519
"""
1620

1721
## Depending on routine type, route to corresponding test routine
18-
if args.routine in [
19-
"BatchDecodeWithPagedKVCacheWrapper",
20-
"BatchPrefillWithPagedKVCacheWrapper",
21-
"BatchPrefillWithRaggedKVCacheWrapper",
22-
]:
22+
if args.routine in benchmark_apis["attention"]:
2323
res = run_attention_test(args)
24-
elif args.routine in [
25-
"gemm_fp8_nt_groupwise",
26-
"group_gemm_fp8_nt_groupwise",
27-
]:
24+
elif args.routine in benchmark_apis["gemm"]:
2825
res = run_gemm_test(args)
2926
else:
3027
raise ValueError(f"Unsupported routine: {args.routine}")
@@ -63,13 +60,7 @@ def parse_args(line=sys.argv[1:]):
6360
"-R",
6461
type=str,
6562
required=True,
66-
choices=[
67-
"BatchDecodeWithPagedKVCacheWrapper",
68-
"BatchPrefillWithPagedKVCacheWrapper",
69-
"BatchPrefillWithRaggedKVCacheWrapper",
70-
"gemm_fp8_nt_groupwise",
71-
"group_gemm_fp8_nt_groupwise",
72-
],
63+
choices=list(benchmark_apis["attention"]) + list(benchmark_apis["gemm"]),
7364
)
7465
args, _ = parser.parse_known_args(line[:])
7566

@@ -122,16 +113,9 @@ def parse_args(line=sys.argv[1:]):
122113
)
123114

124115
## Check routine and pass on to routine-specific argument parser
125-
if args.routine in [
126-
"BatchDecodeWithPagedKVCacheWrapper",
127-
"BatchPrefillWithPagedKVCacheWrapper",
128-
"BatchPrefillWithRaggedKVCacheWrapper",
129-
]:
116+
if args.routine in benchmark_apis["attention"]:
130117
args = parse_attention_args(line, parser)
131-
elif args.routine in [
132-
"gemm_fp8_nt_groupwise",
133-
"group_gemm_fp8_nt_groupwise",
134-
]:
118+
elif args.routine in benchmark_apis["gemm"]:
135119
args = parse_gemm_args(line, parser)
136120
else:
137121
raise ValueError(f"Unsupported routine: {args.routine}")

0 commit comments

Comments
 (0)