-
Notifications
You must be signed in to change notification settings - Fork 441
Commit 85a9a8d
benchmark: trtllm-gen mha with sink, add benchmark args (#1415)
<!-- .github/pull_request_template.md -->
## 📌 Description
[b200] Benchmark results
`python3 benchmarks/bench_trtllm_fmha.py`
> batch_size=4, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.008246400207281113ms
> memory bandwidth: 973.91 GB/s
> batch_size=4, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.011433599889278412ms
> memory bandwidth: 2801.50 GB/s
> batch_size=4, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.015436799824237823ms
> memory bandwidth: 4147.96 GB/s
> batch_size=4, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.03059200048446655ms
> memory bandwidth: 4185.12 GB/s
> batch_size=128, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.05228480100631714ms
> memory bandwidth: 4915.39 GB/s
> batch_size=128, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.17578879594802857ms
> memory bandwidth: 5830.86 GB/s
> batch_size=128, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 1.6801599502563476ms
> memory bandwidth: 1219.53 GB/s
> batch_size=128, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 5.439588928222657ms
> memory bandwidth: 753.18 GB/s
> batch_size=256, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.09621120095252991ms
> memory bandwidth: 5342.41 GB/s
> batch_size=256, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 1.7040287971496584ms
> memory bandwidth: 1203.03 GB/s
> batch_size=256, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 4.829526329040528ms
> memory bandwidth: 848.53 GB/s
> batch_size=256, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 12.651522827148437ms
> memory bandwidth: 647.67 GB/s
`python3 benchmarks/bench_attention_sink_triton_sgl_decode.py`
> batch_size=4, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.027424000203609467ms
> memory bandwidth: 292.85 GB/s
> batch_size=4, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.06857600063085556ms
> memory bandwidth: 467.09 GB/s
> batch_size=4, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.12588800489902496ms
> memory bandwidth: 508.64 GB/s
> batch_size=4, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.2343679964542389ms
> memory bandwidth: 546.28 GB/s
> batch_size=128, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.09935999661684036ms
> memory bandwidth: 2586.55 GB/s
> batch_size=128, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.2919679880142212ms
> memory bandwidth: 3510.66 GB/s
> batch_size=128, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.5479679703712463ms
> memory bandwidth: 3739.27 GB/s
> batch_size=128, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 1.060703992843628ms
> memory bandwidth: 3862.53 GB/s
> batch_size=256, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.17817600071430206ms
> memory bandwidth: 2884.79 GB/s
> batch_size=256, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.566208004951477ms
> memory bandwidth: 3620.58 GB/s
> batch_size=256, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 1.0823359489440918ms
> memory bandwidth: 3786.26 GB/s
> batch_size=256, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 2.114527940750122ms
> memory bandwidth: 3875.10 GB/s
`python3 benchmarks/bench_trtllm_fmha.py --sink`
> batch_size=4, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.00806720033288002ms
> memory bandwidth: 995.54 GB/s
> batch_size=4, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.011334399878978729ms
> memory bandwidth: 2826.02 GB/s
> batch_size=4, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.01525759994983673ms
> memory bandwidth: 4196.68 GB/s
> batch_size=4, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.03030720055103302ms
> memory bandwidth: 4224.45 GB/s
> batch_size=128, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.05234559774398804ms
> memory bandwidth: 4909.68 GB/s
> batch_size=128, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.1760032057762146ms
> memory bandwidth: 5823.76 GB/s
> batch_size=128, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 1.7758047103881835ms
> memory bandwidth: 1153.84 GB/s
> batch_size=128, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 5.47153606414795ms
> memory bandwidth: 748.78 GB/s
> batch_size=256, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.09606080055236817ms
> memory bandwidth: 5350.78 GB/s
> batch_size=256, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 1.732806396484375ms
> memory bandwidth: 1183.05 GB/s
> batch_size=256, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 4.8847806930542ms
> memory bandwidth: 838.93 GB/s
> batch_size=256, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 13.09429473876953ms
> memory bandwidth: 625.77 GB/s
`python3 benchmarks/bench_attention_sink_triton_sgl_decode.py --sink`
> batch_size=4, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.02755199931561947ms
> memory bandwidth: 291.49 GB/s
> batch_size=4, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.06857600063085556ms
> memory bandwidth: 467.09 GB/s
> batch_size=4, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.12588800489902496ms
> memory bandwidth: 508.64 GB/s
> batch_size=4, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.2343679964542389ms
> memory bandwidth: 546.28 GB/s
> batch_size=128, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.09935999661684036ms
> memory bandwidth: 2586.55 GB/s
> batch_size=128, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.2919679880142212ms
> memory bandwidth: 3510.66 GB/s
> batch_size=128, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.5479999780654907ms
> memory bandwidth: 3739.05 GB/s
> batch_size=128, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 1.060703992843628ms
> memory bandwidth: 3862.53 GB/s
> batch_size=256, seq_len=1024, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.17817600071430206ms
> memory bandwidth: 2884.79 GB/s
> batch_size=256, seq_len=4096, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 0.566208004951477ms
> memory bandwidth: 3620.58 GB/s
> batch_size=256, seq_len=8192, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 1.0823359489440918ms
> memory bandwidth: 3786.26 GB/s
> batch_size=256, seq_len=16384, num_qo_heads=64, num_kv_heads=8,
head_dim=64, page_size=16
> execution time: 2.114527940750122ms
> memory bandwidth: 3875.10 GB/s
## 🔍 Related Issues
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>1 parent 5451029 commit 85a9a8dCopy full SHA for 85a9a8d
File tree
Expand file treeCollapse file tree
3 files changed
+1565
-9
lines changedFilter options
- benchmarks
Expand file treeCollapse file tree
3 files changed
+1565
-9
lines changed
0 commit comments