2:4 sparse for int8/fp8/bf16/fp16 gemm by zhink · Pull Request #10081 · PaddlePaddle/PaddleNLP

zhink · 2025-03-11T08:30:50Z

Before submitting

Lint code. If there are lint issues, please format the code first.

# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py

Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

Others

Description

souport 2:4 sparse mm for int8/fp8/bf16/fp16 gemm,will speed up inference.

m	k	n	原始	sparse	时延加速 %
1	5504	2048	0.104126	0.074995	27.97656
1	2048	6144	0.106624	0.074726	29.91601
4	2048	2048	0.111398	0.075205	32.49022
4	5504	2048	0.111788	0.075164	32.76253
4	2048	6144	0.107662	0.074936	30.39757
16	2048	2048	0.11123	0.075119	32.46537
16	5504	2048	0.113784	0.074776	34.28191
16	2048	6144	0.107559	0.07458	30.66155
128	2048	2048	0.113025	0.075498	33.20234
128	5504	2048	0.113472	0.075649	33.3327
128	2048	6144	0.113517	0.075264	33.69787
1024	2048	2048	0.124668	0.075916	39.10575
1024	5504	2048	0.269366	0.102629	61.89982
1024	2048	6144	0.300202	0.10943	63.54798
4096	2048	2048	0.403619	0.153868	61.87788
4096	5504	2048	0.881655	0.35095	60.19411
4096	2048	6144	1.085515	0.424976	60.85027
8129	2048	2048	0.771866	0.298331	61.34934
8129	5504	2048	1.713396	0.693208	59.54189
8129	2048	6144	2.113265	0.842572	60.12938
1	12800	4096	0.164984	0.074343	54.93907
1	4096	6144	0.117266	0.075706	35.44142
4	4096	4096	0.1087	0.075713	30.3467
4	12800	4096	0.165818	0.074571	55.02846
4	4096	6144	0.115331	0.076101	34.01511
16	4096	4096	0.108385	0.075812	30.05339
16	12800	4096	0.167157	0.075063	55.09452
16	4096	6144	0.115465	0.075706	34.43383
128	4096	4096	0.11487	0.073612	35.91671
128	12800	4096	0.233844	0.085821	63.30008
128	4096	6144	0.148936	0.074333	50.09029
1024	4096	4096	0.395715	0.151453	61.72678
1024	12800	4096	1.086698	0.428071	60.60813
1024	4096	6144	0.516159	0.193346	62.54136
4096	4096	4096	1.320404	0.526541	60.12276
4096	12800	4096	3.66718	1.491941	59.31638
4096	4096	6144	1.875346	0.752808	59.85764
8129	4096	4096	2.571834	1.012316	60.63837
8129	12800	4096	7.201237	2.872038	60.11744
8129	4096	6144	3.670556	1.4961	59.2405
1	13824	5120	0.25453	0.077166	69.68294
1	5120	15360	0.234504	0.077425	66.98343
4	5120	5120	0.114174	0.077253	32.33754
4	13824	5120	0.253641	0.077389	69.48875
4	5120	15360	0.236691	0.077768	67.14382
16	5120	5120	0.114293	0.077579	32.12295
16	13824	5120	0.257506	0.077238	70.00546
16	5120	15360	0.239721	0.077489	67.67546
128	5120	5120	0.166995	0.075183	54.97854
128	13824	5120	0.389022	0.185796	52.24027
128	5120	15360	0.351767	0.140072	60.18037
1024	5120	5120	0.587405	0.22927	60.96896
1024	13824	5120	1.29131	0.575864	55.40463
1024	5120	15360	1.551727	0.59496	61.65821
4096	5120	5120	2.022677	0.783642	61.25718
4096	13824	5120	5.01936	1.957807	60.99489
4096	5120	15360	5.560151	2.264665	59.26973
8129	5120	5120	3.779269	1.51365	59.94861
8129	13824	5120	9.386501	3.79334	59.58728
8129	5120	15360	10.94321	4.521023	58.68651

paddle-bot · 2025-03-11T08:30:55Z

Thanks for your contribution!

codecov · 2025-03-11T09:09:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.80%. Comparing base (44eff1f) to head (2a7c3a8).
⚠️ Report is 93 commits behind head on develop.

⚠️ Current head 2a7c3a8 differs from pull request most recent head 34fcbe4

Please upload reports for the commit 34fcbe4 to get more accurate results.

❌ Your project check has failed because the head coverage (46.80%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #10081      +/-   ##
===========================================
+ Coverage    46.75%   46.80%   +0.04%     
===========================================
  Files          802      802              
  Lines       133882   133728     -154     
===========================================
- Hits         62603    62594       -9     
+ Misses       71279    71134     -145

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-10-22T00:25:05Z

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动，被标记为stale。

github-actions · 2025-12-22T00:28:26Z

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动，被标记为stale。

2:4 sparse for int8/fp8/bf16/fp16 dtype

931c392

zhink force-pushed the sparse branch from 2386ef6 to 931c392 Compare March 12, 2025 01:44

Merge branch 'develop' into sparse

78bfbc2

ZHUI self-requested a review March 20, 2025 03:19

zhink changed the title ~~2:4 sparse for int8/fp8/bf16/fp16 dtype~~ 2:4 sparse for int8/fp8/bf16/fp16 gemm Apr 23, 2025

zhink added 2 commits May 9, 2025 15:29

Merge commit 'f07134333282792acd39c54b3c79a2cc12e7cdf8' into sparse

cb7c403

Merge commit '759ae99609fbcefe065a4f74b686d5a442b4a0bf' into sparse

b3f4a33

zhink force-pushed the sparse branch from 317e4bd to 6ffba9e Compare May 11, 2025 14:17

fix

c3d9883

zhink force-pushed the sparse branch from 6ffba9e to c3d9883 Compare May 12, 2025 03:34

fix

2a7c3a8

zhink force-pushed the sparse branch from 24b17d8 to 2a7c3a8 Compare July 9, 2025 04:18

Merge branch 'PaddlePaddle:develop' into sparse

b157d5f

zhink force-pushed the sparse branch 2 times, most recently from e17eb7f to 2edff4a Compare July 30, 2025 11:04

run ci

34fcbe4

zhink force-pushed the sparse branch from 2edff4a to 34fcbe4 Compare July 31, 2025 03:19

zhink marked this pull request as draft August 22, 2025 05:31

zhink marked this pull request as ready for review August 22, 2025 05:31

github-actions bot added stale and removed stale labels Oct 22, 2025

github-actions bot added the stale label Dec 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2:4 sparse for int8/fp8/bf16/fp16 gemm#10081

2:4 sparse for int8/fp8/bf16/fp16 gemm#10081
zhink wants to merge 8 commits intoPaddlePaddle:developfrom
zhink:sparse

zhink commented Mar 11, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Mar 11, 2025

Uh oh!

codecov bot commented Mar 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhink commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Mar 11, 2025

Uh oh!

codecov bot commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhink commented Mar 11, 2025 •

edited

Loading

codecov bot commented Mar 11, 2025 •

edited

Loading