Skip to content

Optimize trial division and half-limb probable prime test in n_is_prime#2616

Merged
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:isprime8
Mar 22, 2026
Merged

Optimize trial division and half-limb probable prime test in n_is_prime#2616
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:isprime8

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

Yet another update to n_is_prime, giving up to 1.7x speedup in the 21-32 bit range and up to 1.1x speedup in the 33-64 bit range.

  1. To compute 2^e mod n for n < 2^32, we use fast 64-bit division with remainder (Fast single-word division and remainder #2597) instead of Shoup mulmods. This doesn't seem to be significantly faster by itself, but it allows doing the multiplications by 2 without modular reduction when n < 2^31. Combined with making the multiplications by 2 branch-free, this gives most of the speedup.

  2. Previously we did branchy trial division by 18 odd primes for n < 2^32 and 34 odd primes for n > 2^32, relying on the compiler to turn !(n%19) into something efficient. Current GCC generates suboptimal code for divisibility testing by constants, so we reimplement the trial division using Granlund-Montgomery. In addition, for n < 2^32, we keep the divisions by 3 and 5 branchy, but do the rest with branch-free 32-bit code, increasing the number of trial primes to 34. GCC turns this block of 32 trial divisions into a handful of AVX2 instructions which on average execute faster than a loop with early abort. (In the worst case for this method, i.e. if one only inputs n which are divisible by 7 but not by 2, 3 or 5, this is only 15% slower than the old code.) With AVX512 this could make sense in the 64-bit case as well.

Detailed speedups by bit size and input type:

                  random                        primes               composites without small factors
bits |    old       new    speedup |     old       new    speedup |     old       new    speedup 
16   | 2.80e-09  2.83e-09   0.989  |  2.59e-09  2.56e-09   1.012  |  2.57e-09  2.57e-09   1.000
17   | 2.80e-09  2.84e-09   0.986  |  2.56e-09  2.56e-09   1.000  |  2.57e-09  2.57e-09   1.000
18   | 2.81e-09  2.83e-09   0.993  |  2.79e-09  2.56e-09   1.090  |  2.56e-09  2.57e-09   0.996
19   | 2.82e-09  2.83e-09   0.996  |  2.58e-09  2.56e-09   1.008  |  2.57e-09  2.57e-09   1.000
20   | 2.82e-09  2.84e-09   0.993  |  2.79e-09  2.80e-09   0.996  |  2.57e-09  2.78e-09   0.924
21   | 8.98e-09  5.95e-09   1.509  |  5.96e-08  4.48e-08   1.330  |  5.53e-08  3.82e-08   1.448
22   | 1.07e-08  6.45e-09   1.659  |  6.38e-08  4.82e-08   1.324  |  5.76e-08  4.08e-08   1.412
23   | 1.03e-08  6.45e-09   1.597  |  6.71e-08  5.10e-08   1.316  |  6.13e-08  4.45e-08   1.378
24   | 1.19e-08  7.24e-09   1.644  |  7.07e-08  5.48e-08   1.290  |  6.51e-08  4.68e-08   1.391
25   | 1.26e-08  7.45e-09   1.691  |  7.42e-08  5.66e-08   1.311  |  6.83e-08  4.95e-08   1.380
26   | 1.29e-08  7.56e-09   1.706  |  7.74e-08  6.01e-08   1.288  |  7.21e-08  5.27e-08   1.368
27   | 1.38e-08  8.12e-09   1.700  |  8.05e-08  6.38e-08   1.262  |  7.50e-08  5.64e-08   1.330
28   | 1.38e-08  8.27e-09   1.669  |  8.42e-08  6.62e-08   1.272  |  7.84e-08  5.87e-08   1.336
29   | 1.46e-08  8.69e-09   1.680  |  8.74e-08  6.88e-08   1.270  |  8.17e-08  6.15e-08   1.328
30   | 1.41e-08  8.52e-09   1.655  |  9.08e-08  7.26e-08   1.251  |  8.50e-08  6.48e-08   1.312
31   | 1.51e-08  9.09e-09   1.661  |  9.40e-08  7.61e-08   1.235  |  8.83e-08  6.85e-08   1.289
32   | 1.58e-08  1.06e-08   1.491  |  9.74e-08  8.88e-08   1.097  |  9.14e-08  8.08e-08   1.131
33   | 2.14e-08  2.02e-08   1.059  |  2.21e-07  2.09e-07   1.057  |  1.13e-07  1.02e-07   1.108
34   | 2.15e-08  2.04e-08   1.054  |  2.26e-07  2.14e-07   1.056  |  1.16e-07  1.05e-07   1.105
35   | 2.16e-08  2.07e-08   1.043  |  2.33e-07  2.20e-07   1.059  |  1.19e-07  1.07e-07   1.112
36   | 2.11e-08  2.00e-08   1.055  |  2.38e-07  2.27e-07   1.048  |  1.22e-07  1.10e-07   1.109
37   | 2.17e-08  2.06e-08   1.053  |  2.44e-07  2.32e-07   1.052  |  1.25e-07  1.14e-07   1.096
38   | 2.42e-08  2.32e-08   1.043  |  2.49e-07  2.39e-07   1.042  |  1.27e-07  1.17e-07   1.085
39   | 2.30e-08  2.20e-08   1.045  |  2.55e-07  2.45e-07   1.041  |  1.30e-07  1.19e-07   1.092
40   | 2.36e-08  2.26e-08   1.044  |  2.61e-07  2.50e-07   1.044  |  1.33e-07  1.22e-07   1.090
41   | 2.27e-08  2.18e-08   1.041  |  2.66e-07  2.57e-07   1.035  |  1.36e-07  1.25e-07   1.088
42   | 2.29e-08  2.21e-08   1.036  |  2.73e-07  2.62e-07   1.042  |  1.39e-07  1.29e-07   1.078
43   | 2.34e-08  2.25e-08   1.040  |  2.78e-07  2.67e-07   1.041  |  1.42e-07  1.31e-07   1.084
44   | 2.46e-08  2.37e-08   1.038  |  2.90e-07  2.79e-07   1.039  |  1.45e-07  1.35e-07   1.074
45   | 2.37e-08  2.30e-08   1.030  |  2.89e-07  2.80e-07   1.032  |  1.47e-07  1.37e-07   1.073
46   | 2.64e-08  2.56e-08   1.031  |  2.95e-07  2.87e-07   1.028  |  1.49e-07  1.40e-07   1.064
47   | 2.53e-08  2.46e-08   1.028  |  3.00e-07  2.91e-07   1.031  |  1.52e-07  1.42e-07   1.070
48   | 2.58e-08  2.50e-08   1.032  |  3.06e-07  2.96e-07   1.034  |  1.56e-07  1.45e-07   1.076
49   | 2.70e-08  2.62e-08   1.031  |  3.11e-07  3.04e-07   1.023  |  1.58e-07  1.51e-07   1.046
50   | 2.58e-08  2.49e-08   1.036  |  3.17e-07  3.09e-07   1.026  |  1.60e-07  1.51e-07   1.060
51   | 2.73e-08  2.64e-08   1.034  |  3.23e-07  3.13e-07   1.032  |  1.63e-07  1.54e-07   1.058
52   | 2.72e-08  2.64e-08   1.030  |  3.28e-07  3.19e-07   1.028  |  1.66e-07  1.58e-07   1.051
53   | 2.83e-08  2.71e-08   1.044  |  3.36e-07  3.25e-07   1.034  |  1.70e-07  1.59e-07   1.069
54   | 2.73e-08  2.62e-08   1.042  |  3.43e-07  3.29e-07   1.043  |  1.74e-07  1.61e-07   1.081
55   | 2.85e-08  2.73e-08   1.044  |  3.48e-07  3.36e-07   1.036  |  1.78e-07  1.65e-07   1.079
56   | 2.80e-08  2.69e-08   1.041  |  3.53e-07  3.41e-07   1.035  |  1.79e-07  1.70e-07   1.053
57   | 2.89e-08  2.78e-08   1.040  |  3.59e-07  3.46e-07   1.038  |  1.81e-07  1.70e-07   1.065
58   | 2.92e-08  2.79e-08   1.047  |  3.65e-07  3.51e-07   1.040  |  1.84e-07  1.72e-07   1.070
59   | 2.92e-08  2.85e-08   1.025  |  3.73e-07  3.59e-07   1.039  |  1.87e-07  1.77e-07   1.056
60   | 3.03e-08  2.92e-08   1.038  |  3.77e-07  3.65e-07   1.033  |  1.90e-07  1.78e-07   1.067
61   | 3.10e-08  2.99e-08   1.037  |  3.82e-07  3.69e-07   1.035  |  1.94e-07  1.80e-07   1.078
62   | 2.99e-08  2.85e-08   1.049  |  3.91e-07  3.76e-07   1.040  |  1.99e-07  1.83e-07   1.087
63   | 3.39e-08  3.19e-08   1.063  |  4.35e-07  4.19e-07   1.038  |  2.23e-07  2.07e-07   1.077
64   | 3.56e-08  3.38e-08   1.053  |  4.43e-07  4.27e-07   1.037  |  2.28e-07  2.09e-07   1.091

@fredrik-johansson fredrik-johansson merged commit 6a7cca0 into flintlib:main Mar 22, 2026
12 checks passed
@fredrik-johansson fredrik-johansson deleted the isprime8 branch March 22, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant