Optimize trial division and half-limb probable prime test in `n_is_prime` by fredrik-johansson · Pull Request #2616 · flintlib/flint

fredrik-johansson · 2026-03-21T17:11:07Z

Yet another update to n_is_prime, giving up to 1.7x speedup in the 21-32 bit range and up to 1.1x speedup in the 33-64 bit range.

To compute 2^e mod n for n < 2^32, we use fast 64-bit division with remainder (Fast single-word division and remainder #2597) instead of Shoup mulmods. This doesn't seem to be significantly faster by itself, but it allows doing the multiplications by 2 without modular reduction when n < 2^31. Combined with making the multiplications by 2 branch-free, this gives most of the speedup.
Previously we did branchy trial division by 18 odd primes for n < 2^32 and 34 odd primes for n > 2^32, relying on the compiler to turn !(n%19) into something efficient. Current GCC generates suboptimal code for divisibility testing by constants, so we reimplement the trial division using Granlund-Montgomery. In addition, for n < 2^32, we keep the divisions by 3 and 5 branchy, but do the rest with branch-free 32-bit code, increasing the number of trial primes to 34. GCC turns this block of 32 trial divisions into a handful of AVX2 instructions which on average execute faster than a loop with early abort. (In the worst case for this method, i.e. if one only inputs n which are divisible by 7 but not by 2, 3 or 5, this is only 15% slower than the old code.) With AVX512 this could make sense in the 64-bit case as well.

Detailed speedups by bit size and input type:

                  random                        primes               composites without small factors
bits |    old       new    speedup |     old       new    speedup |     old       new    speedup 
16   | 2.80e-09  2.83e-09   0.989  |  2.59e-09  2.56e-09   1.012  |  2.57e-09  2.57e-09   1.000
17   | 2.80e-09  2.84e-09   0.986  |  2.56e-09  2.56e-09   1.000  |  2.57e-09  2.57e-09   1.000
18   | 2.81e-09  2.83e-09   0.993  |  2.79e-09  2.56e-09   1.090  |  2.56e-09  2.57e-09   0.996
19   | 2.82e-09  2.83e-09   0.996  |  2.58e-09  2.56e-09   1.008  |  2.57e-09  2.57e-09   1.000
20   | 2.82e-09  2.84e-09   0.993  |  2.79e-09  2.80e-09   0.996  |  2.57e-09  2.78e-09   0.924
21   | 8.98e-09  5.95e-09   1.509  |  5.96e-08  4.48e-08   1.330  |  5.53e-08  3.82e-08   1.448
22   | 1.07e-08  6.45e-09   1.659  |  6.38e-08  4.82e-08   1.324  |  5.76e-08  4.08e-08   1.412
23   | 1.03e-08  6.45e-09   1.597  |  6.71e-08  5.10e-08   1.316  |  6.13e-08  4.45e-08   1.378
24   | 1.19e-08  7.24e-09   1.644  |  7.07e-08  5.48e-08   1.290  |  6.51e-08  4.68e-08   1.391
25   | 1.26e-08  7.45e-09   1.691  |  7.42e-08  5.66e-08   1.311  |  6.83e-08  4.95e-08   1.380
26   | 1.29e-08  7.56e-09   1.706  |  7.74e-08  6.01e-08   1.288  |  7.21e-08  5.27e-08   1.368
27   | 1.38e-08  8.12e-09   1.700  |  8.05e-08  6.38e-08   1.262  |  7.50e-08  5.64e-08   1.330
28   | 1.38e-08  8.27e-09   1.669  |  8.42e-08  6.62e-08   1.272  |  7.84e-08  5.87e-08   1.336
29   | 1.46e-08  8.69e-09   1.680  |  8.74e-08  6.88e-08   1.270  |  8.17e-08  6.15e-08   1.328
30   | 1.41e-08  8.52e-09   1.655  |  9.08e-08  7.26e-08   1.251  |  8.50e-08  6.48e-08   1.312
31   | 1.51e-08  9.09e-09   1.661  |  9.40e-08  7.61e-08   1.235  |  8.83e-08  6.85e-08   1.289
32   | 1.58e-08  1.06e-08   1.491  |  9.74e-08  8.88e-08   1.097  |  9.14e-08  8.08e-08   1.131
33   | 2.14e-08  2.02e-08   1.059  |  2.21e-07  2.09e-07   1.057  |  1.13e-07  1.02e-07   1.108
34   | 2.15e-08  2.04e-08   1.054  |  2.26e-07  2.14e-07   1.056  |  1.16e-07  1.05e-07   1.105
35   | 2.16e-08  2.07e-08   1.043  |  2.33e-07  2.20e-07   1.059  |  1.19e-07  1.07e-07   1.112
36   | 2.11e-08  2.00e-08   1.055  |  2.38e-07  2.27e-07   1.048  |  1.22e-07  1.10e-07   1.109
37   | 2.17e-08  2.06e-08   1.053  |  2.44e-07  2.32e-07   1.052  |  1.25e-07  1.14e-07   1.096
38   | 2.42e-08  2.32e-08   1.043  |  2.49e-07  2.39e-07   1.042  |  1.27e-07  1.17e-07   1.085
39   | 2.30e-08  2.20e-08   1.045  |  2.55e-07  2.45e-07   1.041  |  1.30e-07  1.19e-07   1.092
40   | 2.36e-08  2.26e-08   1.044  |  2.61e-07  2.50e-07   1.044  |  1.33e-07  1.22e-07   1.090
41   | 2.27e-08  2.18e-08   1.041  |  2.66e-07  2.57e-07   1.035  |  1.36e-07  1.25e-07   1.088
42   | 2.29e-08  2.21e-08   1.036  |  2.73e-07  2.62e-07   1.042  |  1.39e-07  1.29e-07   1.078
43   | 2.34e-08  2.25e-08   1.040  |  2.78e-07  2.67e-07   1.041  |  1.42e-07  1.31e-07   1.084
44   | 2.46e-08  2.37e-08   1.038  |  2.90e-07  2.79e-07   1.039  |  1.45e-07  1.35e-07   1.074
45   | 2.37e-08  2.30e-08   1.030  |  2.89e-07  2.80e-07   1.032  |  1.47e-07  1.37e-07   1.073
46   | 2.64e-08  2.56e-08   1.031  |  2.95e-07  2.87e-07   1.028  |  1.49e-07  1.40e-07   1.064
47   | 2.53e-08  2.46e-08   1.028  |  3.00e-07  2.91e-07   1.031  |  1.52e-07  1.42e-07   1.070
48   | 2.58e-08  2.50e-08   1.032  |  3.06e-07  2.96e-07   1.034  |  1.56e-07  1.45e-07   1.076
49   | 2.70e-08  2.62e-08   1.031  |  3.11e-07  3.04e-07   1.023  |  1.58e-07  1.51e-07   1.046
50   | 2.58e-08  2.49e-08   1.036  |  3.17e-07  3.09e-07   1.026  |  1.60e-07  1.51e-07   1.060
51   | 2.73e-08  2.64e-08   1.034  |  3.23e-07  3.13e-07   1.032  |  1.63e-07  1.54e-07   1.058
52   | 2.72e-08  2.64e-08   1.030  |  3.28e-07  3.19e-07   1.028  |  1.66e-07  1.58e-07   1.051
53   | 2.83e-08  2.71e-08   1.044  |  3.36e-07  3.25e-07   1.034  |  1.70e-07  1.59e-07   1.069
54   | 2.73e-08  2.62e-08   1.042  |  3.43e-07  3.29e-07   1.043  |  1.74e-07  1.61e-07   1.081
55   | 2.85e-08  2.73e-08   1.044  |  3.48e-07  3.36e-07   1.036  |  1.78e-07  1.65e-07   1.079
56   | 2.80e-08  2.69e-08   1.041  |  3.53e-07  3.41e-07   1.035  |  1.79e-07  1.70e-07   1.053
57   | 2.89e-08  2.78e-08   1.040  |  3.59e-07  3.46e-07   1.038  |  1.81e-07  1.70e-07   1.065
58   | 2.92e-08  2.79e-08   1.047  |  3.65e-07  3.51e-07   1.040  |  1.84e-07  1.72e-07   1.070
59   | 2.92e-08  2.85e-08   1.025  |  3.73e-07  3.59e-07   1.039  |  1.87e-07  1.77e-07   1.056
60   | 3.03e-08  2.92e-08   1.038  |  3.77e-07  3.65e-07   1.033  |  1.90e-07  1.78e-07   1.067
61   | 3.10e-08  2.99e-08   1.037  |  3.82e-07  3.69e-07   1.035  |  1.94e-07  1.80e-07   1.078
62   | 2.99e-08  2.85e-08   1.049  |  3.91e-07  3.76e-07   1.040  |  1.99e-07  1.83e-07   1.087
63   | 3.39e-08  3.19e-08   1.063  |  4.35e-07  4.19e-07   1.038  |  2.23e-07  2.07e-07   1.077
64   | 3.56e-08  3.38e-08   1.053  |  4.43e-07  4.27e-07   1.037  |  2.28e-07  2.09e-07   1.091

Optimize trial division and half-limb probable prime test in n_is_prime

9a8f553

fredrik-johansson merged commit 6a7cca0 into flintlib:main Mar 22, 2026
12 checks passed

fredrik-johansson deleted the isprime8 branch March 22, 2026 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize trial division and half-limb probable prime test in `n_is_prime`#2616

Optimize trial division and half-limb probable prime test in `n_is_prime`#2616
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:isprime8

fredrik-johansson commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fredrik-johansson commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant