Skip to content

Conversation

@Max-Cheng
Copy link

@Max-Cheng Max-Cheng commented Aug 1, 2025

What problem does this PR solve?

using arm neon simd function to replace simple for-loop
Issue Number: close #54373

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. Using Neon function to replace simple for-loop
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Local Benchmark Result(working on M3 Pro By Apple Silicon)

------------------------------------------------------------------------------
Benchmark                    Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------
BM_HLL_Merge_Scalar      0.212 us        0.212 us      3317205 bytes_per_second=143.999Gi/s items_per_second=4.71857M/s
BM_HLL_Merge_NEON        0.144 us        0.144 us      4854066 bytes_per_second=212.054Gi/s items_per_second=6.94859M/s

@Thearas
Copy link
Contributor

Thearas commented Aug 1, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Max-Cheng Max-Cheng marked this pull request as ready for review August 7, 2025 05:58
@Max-Cheng
Copy link
Author

@HappenLee PTAL

@koarz
Copy link
Contributor

koarz commented Aug 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 35088 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 44a2f2f70e9c3f79e8651942ca98a0306dab22fd, data reload: false

------ Round 1 ----------------------------------
q1	17635	5716	5443	5443
q2	1953	304	200	200
q3	10247	1405	739	739
q4	10228	1077	543	543
q5	8508	2730	2663	2663
q6	206	178	136	136
q7	1016	832	652	652
q8	9338	1516	1198	1198
q9	6763	5264	5269	5264
q10	6978	2442	2026	2026
q11	489	313	271	271
q12	366	396	234	234
q13	17779	3637	3077	3077
q14	265	253	223	223
q15	579	485	496	485
q16	444	451	390	390
q17	597	880	355	355
q18	7595	7022	6986	6986
q19	2188	1096	593	593
q20	352	333	216	216
q21	3699	3360	2414	2414
q22	1089	1053	980	980
Total cold run time: 108314 ms
Total hot run time: 35088 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5949	5718	5731	5718
q2	237	326	222	222
q3	2199	2692	2253	2253
q4	1453	1996	1514	1514
q5	4677	4550	4530	4530
q6	268	200	144	144
q7	2221	2032	1808	1808
q8	2874	2739	2809	2739
q9	7444	7142	7402	7142
q10	3604	3534	3013	3013
q11	606	512	519	512
q12	743	853	648	648
q13	3657	3878	3286	3286
q14	302	305	299	299
q15	551	487	483	483
q16	508	499	467	467
q17	1240	1674	1533	1533
q18	8407	7765	8097	7765
q19	12539	967	953	953
q20	2847	2003	1765	1765
q21	14405	4514	4456	4456
q22	1072	1016	962	962
Total cold run time: 77803 ms
Total hot run time: 52212 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 170221 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 44a2f2f70e9c3f79e8651942ca98a0306dab22fd, data reload: false

============================================
query1	1005	387	428	387
query2	6527	1854	1697	1697
query3	6734	223	219	219
query4	27032	23493	22773	22773
query5	4350	634	500	500
query6	323	233	215	215
query7	4631	522	298	298
query8	272	251	249	249
query9	8619	2988	2969	2969
query10	496	333	295	295
query11	15897	15032	14832	14832
query12	178	135	133	133
query13	1664	559	412	412
query14	8655	5873	5891	5873
query15	220	197	169	169
query16	7147	676	475	475
query17	983	763	650	650
query18	1997	445	333	333
query19	234	226	195	195
query20	152	143	158	143
query21	221	128	109	109
query22	3953	3988	3951	3951
query23	34562	34107	34270	34107
query24	5287	2384	2431	2384
query25	496	509	445	445
query26	716	296	162	162
query27	2257	511	351	351
query28	3103	2336	2332	2332
query29	634	650	492	492
query30	287	234	195	195
query31	852	792	712	712
query32	90	78	76	76
query33	507	427	369	369
query34	806	843	517	517
query35	825	832	759	759
query36	975	1021	947	947
query37	136	114	88	88
query38	4035	3922	3904	3904
query39	1461	1368	1354	1354
query40	246	146	133	133
query41	63	57	55	55
query42	137	123	135	123
query43	521	496	492	492
query44	1403	899	880	880
query45	202	194	179	179
query46	944	1062	679	679
query47	1767	1836	1737	1737
query48	422	418	316	316
query49	680	507	431	431
query50	682	696	417	417
query51	4130	4207	4175	4175
query52	131	130	121	121
query53	267	292	208	208
query54	655	639	564	564
query55	95	94	89	89
query56	357	371	360	360
query57	1182	1194	1131	1131
query58	341	336	344	336
query59	2566	2673	2552	2552
query60	406	403	387	387
query61	127	126	129	126
query62	773	751	671	671
query63	258	212	213	212
query64	2329	1151	792	792
query65	4239	4157	4140	4140
query66	1093	458	338	338
query67	query68	17108	931	1186	931
query69	1041	282	288	282
query70	1391	1092	1096	1092
query71	710	321	325	321
query72	9183	2342	2404	2342
query73	3483	656	359	359
query74	9058	8997	8801	8801
query75	7753	3160	2715	2715
query76	8869	1219	798	798
query77	1146	489	337	337
query78	9488	10662	9404	9404
query79	14886	680	577	577
query80	1647	557	507	507
query81	560	273	222	222
query82	500	160	119	119
query83	384	297	275	275
query84	312	106	83	83
query85	941	376	353	353
query86	362	306	290	290
query87	4257	4218	4122	4122
query88	5570	2223	2198	2198
query89	501	365	318	318
query90	2586	222	241	222
query91	145	140	117	117
query92	93	71	68	68
query93	6375	965	655	655
query94	1076	398	273	273
query95	419	340	328	328
query96	503	586	282	282
query97	2746	2751	2604	2604
query98	278	238	219	219
query99	1499	1414	1292	1292
Total cold run time: 299205 ms
Total hot run time: 170221 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 44a2f2f70e9c3f79e8651942ca98a0306dab22fd, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.04	0.05
query3	0.24	0.07	0.08
query4	1.61	0.11	0.11
query5	0.43	0.43	0.43
query6	1.18	0.68	0.66
query7	0.03	0.02	0.02
query8	0.04	0.03	0.04
query9	0.56	0.47	0.47
query10	0.54	0.52	0.52
query11	0.16	0.10	0.11
query12	0.15	0.11	0.11
query13	0.64	0.64	0.66
query14	0.95	1.10	1.20
query15	0.91	0.90	0.89
query16	0.39	0.41	0.39
query17	1.06	1.05	1.06
query18	0.22	0.21	0.21
query19	2.00	1.94	1.83
query20	0.01	0.01	0.01
query21	15.36	0.86	0.56
query22	0.76	1.27	0.65
query23	14.89	1.17	0.62
query24	7.63	1.03	0.92
query25	0.49	0.15	0.16
query26	0.64	0.15	0.14
query27	0.06	0.05	0.05
query28	9.27	0.84	0.45
query29	12.61	3.90	3.36
query30	3.03	3.00	2.94
query31	2.81	0.58	0.39
query32	3.23	0.59	0.50
query33	3.03	3.23	3.19
query34	16.09	5.36	4.93
query35	4.91	4.93	4.95
query36	0.71	0.52	0.50
query37	0.10	0.07	0.07
query38	0.06	0.05	0.04
query39	0.03	0.03	0.03
query40	0.18	0.13	0.13
query41	0.08	0.03	0.02
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 107.29 s
Total hot run time: 33.34 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 58.89% (16586/28162)
Line Coverage 47.79% (150289/314483)
Region Coverage 36.67% (112586/307038)
Branch Coverage 39.64% (50028/126221)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.40% (22497/27638)
Line Coverage 74.07% (232929/314492)
Region Coverage 61.47% (193812/315289)
Branch Coverage 65.57% (83784/127769)

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 19, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@Max-Cheng Max-Cheng closed this Jan 7, 2026
@zclllyybb
Copy link
Contributor

hi @Max-Cheng , why closed this PR? Is it ready for merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Rewrite the AVX instruction by ARM NEON

7 participants