Skip to content

Conversation

tompng
Copy link
Member

@tompng tompng commented Aug 20, 2025

Multiplication gets maximum 800,000 times faster.

Raises Multiply size too large (ArgumentError) if size is larger than the limitation:
x * y requires [x.n_significant_digits, y.n_significant_digits].min <= 603979776

x = BigDecimal('9'*(9<<26)); x * x
# 270 days(estimated) → 29 seconds

BigDecimal('9'*(9<<27)).div(BigDecimal('7'*(9<<26)), 9<<26)
# 84 days(estimated) → 184 seconds

Basic policy of this pull request

  • Make calculation fast in a wide range of precisions with relatively low amount of code and low maintenance cost
  • Only use NTT and achieve O(nlogn) with a limited (but enough) maximum precision
  • Less conditional branch, No Karatsuba / Toom-3,4,5 / uint128-dependency
  • Don't make the code complicated for just a small constant factor speedup

NTT(Numeric Theory Translation) multiplication

Calculates multiplication/convolution using NTT with three primes.

# Calculate convolution in mod prime1, prim2 and prime3
conv1 = ntt_inverse(ntt(a, prime1).zip(ntt(b, prime1)).map{ _1 * _2 }, prime1)
conv2 = ...
conv3 = ...
# Restore actual convolution from conv1, conv2, conv3
conv = restore_convolution_from_modulo(conv1, conv2, conv3)

Consider calculating convolution of two arrays. Each array is of size N with array[i] in 0..999999999.
Maximum value of convolution[i] is 999999999**2 * N. This value is larger than 64bit and smaller than 96bit, so we need three 32-bit primes: 29<<27|1, 26<<27|1, 24<<27|1.
These are three largest 32-bit primes that satisfies P > 999999999, and P-1 need to be a multiple of large powers of two.
Constraints from this primes, maximum N is 1<<27.

Combination of primes/sizes

BASE Primes N estimated speed (smaller is better) memo
10**9 32bit x 3 1<<27 1 This pull request, No digits repacking
10**3 32bit x 1 1<<12 1 N is too small
10**6 32bit x 2 1<<24 1 N is small
10**6 64bit x 1 1<<24 2 N is small, needs uint128_t
10**14 64bit x 2 1<<34 12/7 Needs uint128_t
10**3 64bit x 1 1<<39 4 Needs uint128_t

Multiplication of various size bigdecimal

Considering xx_xx_xx * yy
Calculate by convolution(ntt(xx_xx_xx), ntt(00_00_yy)) is possible, but repeating convolution(ntt(xx), ntt(yy)) is faster.

# aaaa_bbbb_cccc * yyyy
ntt_y = ntt(yyyy) # Calculate once and reuse
convolution(ntt(aaaa), ntt_y)
convolution(ntt(bbbb), ntt_y)
convolution(ntt(cccc), ntt_y)

If n_significant_digits is both larger than 1<<<26 == 603979776, multiplication fails with Error(too large).

Newton-Raphson division

X / Y can be calculated by X * Yinv
and Yinv can be calculated only by add/sub/mult using Newton's method.

x = 0.1
10.times { x = x * (2 - 7 * x) }
x #=> 0.14285714285714285 (== 1/7.0)

Division of various size bigdecimal

Considering 1111_1111_1111_1111.div(7777_7777_7777)
Required precision is 4. Calculating inverse of 7777_7777_7777 in 4 digit is enough.
1111_1111_1111_1111 * 1.285e-12 == 1428

Considering 1111_1111_1111_1111.div(7777)
We can calculate this by repeating xxxx_xxxx.divmod(7777) 3 times.
Calculating inverse of 7777 in 4 digit is enough.

Generic case: Split X into several blocks
xxx_xxxxx_xxxxx_xxxxx / yyyy
Can be calculated by repeating xxxx_xxxxx.divmod(yyyy) 3 times.
xxx_xxxx_xxxx / yyyyy
Can be calculated by repeating xxxxx_xxxx.divmod(yyyyy) 2 times.
xxxxx_xxxxxxx / yyyyy
Can be calculated by repeating xxxxxx_xxxxxxx.divmod(yyyyy) 1 time.

Tests

All bigdecimal test passes with NTT_MULTIPLICATION_THRESHOLD==1 and NEWTON_RAPHSON_DIVISION_THRESHOLD==1 (All mult/div uses NTT and Newton-Raphson)

These large multiplication test passes. (Too slow to add to CI)

x = BigDecimal('9'*(9<<26))
(x*x).to_s.match?(/^0\.9+80+1e\d+$/) #=> true

ss = 20.times.map{(9<<10).times.map{rand(0..9)}.join}
x = BigDecimal((3<<16).times.map{ss.sample}.join)
y = BigDecimal((1<<16).times.map{ss.sample}.join)
prime = 33333331
(x%prime)*(y%prime)%prime == (x*y)%prime #=> true

@tompng tompng force-pushed the ntt_mult_div branch 9 times, most recently from fabc2d2 to 236c1ff Compare August 27, 2025 16:57
@tompng tompng force-pushed the ntt_mult_div branch 3 times, most recently from 64e3d7c to 6b2506b Compare September 3, 2025 13:46
@tompng tompng marked this pull request as ready for review September 3, 2025 14:24
@tompng tompng force-pushed the ntt_mult_div branch 2 times, most recently from 46fd632 to b2e5dac Compare September 9, 2025 12:16
@tompng tompng force-pushed the ntt_mult_div branch 2 times, most recently from 1a0a3a0 to 0f02ddc Compare September 13, 2025 13:33
tompng added 2 commits October 7, 2025 21:45
Performs ntt with three primes (29<<27|1, 26<<27|1, 24<<27|1)
Improve performance of huge divisions
@tompng tompng force-pushed the ntt_mult_div branch 2 times, most recently from 7d52b9d to 2513f01 Compare October 7, 2025 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant