[WIP, Please benchmark] Use Co-Z in pippenger #1782

peterdettman · 2025-12-09T11:56:07Z

Modifies the pippenger bucket-summing loop. Very early code, just here to get benchmarks of the basic concept.

In short, we ensure running_sum always has the same z coordinate as the accumulator ("r"), so that adding running_sum to r can be done in 5M + 2S (with free update of running_sum for the new Co-Z coordinate); this is often called a ZADDU (or DBLU when it devolves to doubling). On the downside, when adding each bucket to the running_sum, we now need to also update r to keep them Co-Z; cost 3M + 1S. So a typical iteration of the summing loop costs 20M + 7S instead of 24M + 8S.

I measure a few % overall improvement for pippenger_wnaf, depending on the number of points, although I would appreciate others sanity-checking that. Unfortunately it involves rather a lot of new code, not helped by the fact that there are special cases everywhere.

john-moffett · 2025-12-09T18:52:32Z

Apple M2 Max, default build. Master vs d95e48b

secp256k1 configure summary
===========================
Build artifacts:
  library type ........................ Shared
Optional modules:
  ECDH ................................ ON
  ECDSA pubkey recovery ............... OFF
  extrakeys ........................... ON
  schnorrsig .......................... ON
  musig ............................... ON
  ElligatorSwift ...................... ON
Parameters:
  ecmult window size .................. 15
  ecmult gen table size ............... 86 KiB
Optional features:
  assembly ............................ OFF
  external callbacks .................. OFF
Optional binaries:
  benchmark ........................... ON
  noverify_tests ...................... ON
  tests ............................... ON
  exhaustive tests .................... ON
  ctime_tests ......................... OFF
  examples ............................ OFF

Cross compiling ....................... FALSE
API visibility attributes ............. ON
Valgrind .............................. OFF
Preprocessor defined macros ........... ECMULT_WINDOW_SIZE=15 COMB_BLOCKS=43 COMB_TEETH=6
C compiler ............................ AppleClang 17.0.0.17000013, /usr/bin/cc
CFLAGS ................................ 
Compile options ....................... -Wall -pedantic -Wcast-align -Wconditional-uninitialized -Wextra -Wnested-externs -Wno-long-long -Wno-overlength-strings -Wno-unused-function -Wreserved-identifier -Wshadow -Wstrict-prototypes -Wundef
Build type:
 - CMAKE_BUILD_TYPE ................... RelWithDebInfo
 - CFLAGS ............................. -O2 -g 
 - LDFLAGS for executables ............ 
 - LDFLAGS for shared libraries .......

Benchmark	Before Avg (us)	After Avg (us)	Δ Avg (us)	Δ Avg %
ecmult_multi_79p_g	7.81	7.85	+0.04	+0.5%
ecmult_multi_95p_g	7.47	7.32	-0.15	-2.0%
ecmult_multi_111p_g	7.24	7.03	-0.21	-2.9%
ecmult_multi_127p_g	7.15	6.87	-0.28	-3.9%
ecmult_multi_159p_g	6.85	6.49	-0.36	-5.3%
ecmult_multi_191p_g	6.57	6.25	-0.32	-4.9%
ecmult_multi_223p_g	6.28	6.08	-0.20	-3.2%
ecmult_multi_255p_g	6.23	5.99	-0.24	-3.9%
ecmult_multi_319p_g	5.86	5.71	-0.15	-2.6%
ecmult_multi_383p_g	5.71	5.59	-0.12	-2.1%
ecmult_multi_447p_g	5.48	5.41	-0.07	-1.3%
ecmult_multi_511p_g	5.35	5.28	-0.07	-1.3%
ecmult_multi_639p_g	5.33	5.20	-0.13	-2.4%
ecmult_multi_767p_g	5.04	5.06	+0.02	+0.4%
ecmult_multi_895p_g	4.96	4.88	-0.08	-1.6%
ecmult_multi_1023p_g	4.88	4.89	+0.01	+0.2%
ecmult_multi_1279p_g	4.76	4.67	-0.09	-1.9%
ecmult_multi_1535p_g	4.59	4.51	-0.08	-1.7%
ecmult_multi_1791p_g	4.46	4.36	-0.10	-2.2%
ecmult_multi_2047p_g	4.36	4.29	-0.07	-1.6%
ecmult_multi_2559p_g	4.24	4.18	-0.06	-1.4%
ecmult_multi_3071p_g	4.14	4.06	-0.08	-1.9%
ecmult_multi_3583p_g	4.19	4.04	-0.15	-3.6%
ecmult_multi_4095p_g	4.04	3.95	-0.09	-2.2%
ecmult_multi_5119p_g	3.93	3.86	-0.07	-1.8%
ecmult_multi_6143p_g	3.82	3.78	-0.04	-1.0%
ecmult_multi_7167p_g	3.79	3.76	-0.03	-0.8%
ecmult_multi_8191p_g	3.75	3.65	-0.10	-2.7%
ecmult_multi_10239p_g	3.63	3.57	-0.06	-1.7%
ecmult_multi_12287p_g	3.57	3.51	-0.06	-1.7%
ecmult_multi_14335p_g	3.50	3.48	-0.02	-0.6%
ecmult_multi_16383p_g	3.53	3.38	-0.15	-4.2%
ecmult_multi_20479p_g	3.42	3.31	-0.11	-3.2%
ecmult_multi_24575p_g	3.29	3.30	+0.01	+0.3%
ecmult_multi_28671p_g	3.22	3.14	-0.08	-2.5%
ecmult_multi_32767p_g	3.19	3.10	-0.09	-2.8%

siv2r · 2025-12-15T18:48:07Z

I benchmarked this pull request on a MacBook Pro (M4 Pro, ARM64, 12 cores: 8P + 4E) running macOS 15.6.1, plugged in with no background apps. Based on my results, this pull request performs better than master, with an average speedup of ~2%.

Anyone who wants to reproduce the benchmarks can use this Python script. It benchmarks both this pull request and the master branch, and outputs an .xlsx file comparing their performance.

Prototype using CoZ arithmetic in pippenger_wnaf

d95e48b

peterdettman marked this pull request as draft December 9, 2025 11:56

peterdettman mentioned this pull request Dec 9, 2025

[WIP, Please benchmark] Use homogeneous coordinates in pippenger #1767

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP, Please benchmark] Use Co-Z in pippenger #1782

[WIP, Please benchmark] Use Co-Z in pippenger #1782

Uh oh!

peterdettman commented Dec 9, 2025

Uh oh!

john-moffett commented Dec 9, 2025

Uh oh!

siv2r commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP, Please benchmark] Use Co-Z in pippenger #1782

Are you sure you want to change the base?

[WIP, Please benchmark] Use Co-Z in pippenger #1782

Uh oh!

Conversation

peterdettman commented Dec 9, 2025

Uh oh!

john-moffett commented Dec 9, 2025

Uh oh!

siv2r commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants