Skip to content

2:4 sparse for int8/fp8/bf16/fp16 gemm#10081

Open
zhink wants to merge 8 commits intoPaddlePaddle:developfrom
zhink:sparse
Open

2:4 sparse for int8/fp8/bf16/fp16 gemm#10081
zhink wants to merge 8 commits intoPaddlePaddle:developfrom
zhink:sparse

Conversation

@zhink
Copy link
Contributor

@zhink zhink commented Mar 11, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

Others

Description

souport 2:4 sparse mm for int8/fp8/bf16/fp16 gemm,will speed up inference.

m k n 原始 sparse 时延加速 %
1 5504 2048 0.104126 0.074995 27.97656
1 2048 6144 0.106624 0.074726 29.91601
4 2048 2048 0.111398 0.075205 32.49022
4 5504 2048 0.111788 0.075164 32.76253
4 2048 6144 0.107662 0.074936 30.39757
16 2048 2048 0.11123 0.075119 32.46537
16 5504 2048 0.113784 0.074776 34.28191
16 2048 6144 0.107559 0.07458 30.66155
128 2048 2048 0.113025 0.075498 33.20234
128 5504 2048 0.113472 0.075649 33.3327
128 2048 6144 0.113517 0.075264 33.69787
1024 2048 2048 0.124668 0.075916 39.10575
1024 5504 2048 0.269366 0.102629 61.89982
1024 2048 6144 0.300202 0.10943 63.54798
4096 2048 2048 0.403619 0.153868 61.87788
4096 5504 2048 0.881655 0.35095 60.19411
4096 2048 6144 1.085515 0.424976 60.85027
8129 2048 2048 0.771866 0.298331 61.34934
8129 5504 2048 1.713396 0.693208 59.54189
8129 2048 6144 2.113265 0.842572 60.12938
1 12800 4096 0.164984 0.074343 54.93907
1 4096 6144 0.117266 0.075706 35.44142
4 4096 4096 0.1087 0.075713 30.3467
4 12800 4096 0.165818 0.074571 55.02846
4 4096 6144 0.115331 0.076101 34.01511
16 4096 4096 0.108385 0.075812 30.05339
16 12800 4096 0.167157 0.075063 55.09452
16 4096 6144 0.115465 0.075706 34.43383
128 4096 4096 0.11487 0.073612 35.91671
128 12800 4096 0.233844 0.085821 63.30008
128 4096 6144 0.148936 0.074333 50.09029
1024 4096 4096 0.395715 0.151453 61.72678
1024 12800 4096 1.086698 0.428071 60.60813
1024 4096 6144 0.516159 0.193346 62.54136
4096 4096 4096 1.320404 0.526541 60.12276
4096 12800 4096 3.66718 1.491941 59.31638
4096 4096 6144 1.875346 0.752808 59.85764
8129 4096 4096 2.571834 1.012316 60.63837
8129 12800 4096 7.201237 2.872038 60.11744
8129 4096 6144 3.670556 1.4961 59.2405
1 13824 5120 0.25453 0.077166 69.68294
1 5120 15360 0.234504 0.077425 66.98343
4 5120 5120 0.114174 0.077253 32.33754
4 13824 5120 0.253641 0.077389 69.48875
4 5120 15360 0.236691 0.077768 67.14382
16 5120 5120 0.114293 0.077579 32.12295
16 13824 5120 0.257506 0.077238 70.00546
16 5120 15360 0.239721 0.077489 67.67546
128 5120 5120 0.166995 0.075183 54.97854
128 13824 5120 0.389022 0.185796 52.24027
128 5120 15360 0.351767 0.140072 60.18037
1024 5120 5120 0.587405 0.22927 60.96896
1024 13824 5120 1.29131 0.575864 55.40463
1024 5120 15360 1.551727 0.59496 61.65821
4096 5120 5120 2.022677 0.783642 61.25718
4096 13824 5120 5.01936 1.957807 60.99489
4096 5120 15360 5.560151 2.264665 59.26973
8129 5120 5120 3.779269 1.51365 59.94861
8129 13824 5120 9.386501 3.79334 59.58728
8129 5120 15360 10.94321 4.521023 58.68651



@paddle-bot
Copy link

paddle-bot bot commented Mar 11, 2025

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Mar 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.80%. Comparing base (44eff1f) to head (2a7c3a8).
⚠️ Report is 93 commits behind head on develop.

⚠️ Current head 2a7c3a8 differs from pull request most recent head 34fcbe4

Please upload reports for the commit 34fcbe4 to get more accurate results.

❌ Your project check has failed because the head coverage (46.80%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10081      +/-   ##
===========================================
+ Coverage    46.75%   46.80%   +0.04%     
===========================================
  Files          802      802              
  Lines       133882   133728     -154     
===========================================
- Hits         62603    62594       -9     
+ Misses       71279    71134     -145     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ZHUI ZHUI self-requested a review March 20, 2025 03:19
@zhink zhink changed the title 2:4 sparse for int8/fp8/bf16/fp16 dtype 2:4 sparse for int8/fp8/bf16/fp16 gemm Apr 23, 2025
@zhink zhink force-pushed the sparse branch 2 times, most recently from e17eb7f to 2edff4a Compare July 30, 2025 11:04
@zhink zhink marked this pull request as draft August 22, 2025 05:31
@zhink zhink marked this pull request as ready for review August 22, 2025 05:31
@github-actions
Copy link

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

@github-actions github-actions bot added stale and removed stale labels Oct 22, 2025
@github-actions
Copy link

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant