Skip to content

Commit a1dfea9

Browse files
committed
improve comments and macros_for_performance.md
1 parent 3e660c0 commit a1dfea9

File tree

7 files changed

+32
-32
lines changed

7 files changed

+32
-32
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,4 +95,4 @@ The particular ECM variant used by this library originated from Ben Buhrow's mic
9595
The Pollard-Rho-Brent algorithm uses an easy extra step that seems to be unmentioned in the literature. The step is a "pre-loop" that advances as quickly as possible through a portion of the initial pseduo-random sequence before beginning the otherwise normal Pollard-Rho Brent algorithm. The rationale for this is that every Pollard-Rho pseudo-random sequence begins with a non-periodic segment, and trying to extract factors from that segment is mostly wasted work since the algorithm logic relies on a periodic sequence. Using a "pre-loop" that does nothing except iterate for a set number of times through the sequence thus improves performance on average, since it quickly gets past some of that unwanted non-periodic segment. In particular, the "pre-loop" avoids calling the greatest common divisor, which would rarely find a factor during the non-periodic segment. This optimization would likely help any form/variant of the basic Pollard-Rho algorithm.
9696

9797
## Performance Notes
98-
If you're interested in experimenting, predefining certain macros when compiling can improve performance - see [macros_for_performance.md](macros_for_performance.md).
98+
If you're interested in experimenting, defining certain macros when compiling can improve performance - see [macros_for_performance.md](macros_for_performance.md).

include/hurchalla/factoring/detail/experimental/README_pollard_rho.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11

22
The PollardRho\*.h header files in this folder are experimental functors for Pollard-Rho trials.
3-
To use an experimental functor, predefine the macro HURCHALLA_POLLARD_RHO_TRIAL_FUNCTOR_NAME and give it the name of the experimental functor. For example, if you want PollardRhoTrial and you are compiling with clang, you could invoke clang as follows:
3+
To use an experimental functor, define the macro HURCHALLA_POLLARD_RHO_TRIAL_FUNCTOR_NAME and give it the name of the experimental functor. For example, if you want PollardRhoTrial and you are compiling with clang, you could invoke clang as follows:
44
clang++ -DHURCHALLA_POLLARD_RHO_TRIAL_FUNCTOR_NAME=PollardRhoTrial ...more options and files...
55

66
valid names to give HURCHALLA_POLLARD_RHO_TRIAL_FUNCTOR_NAME are:
77
PollardRhoTrial
88
PollardRhoBrentTrial
99
PollardRhoBrentTrialParallel
10-
PollardRhoBrentSwitchingTrial (currently this is the default, so it's not necessary to predefine the macro to this functor)
10+
PollardRhoBrentSwitchingTrial (currently this is the default, so it's not necessary to define the macro to this functor)
1111

1212
The PollardRhoBrentSwitchingTrial functor or the PollardRhoBrentTrialParallel functor will likely perform best on your system, but you can try others.
1313

include/hurchalla/factoring/detail/experimental/get_single_factor.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
#ifndef NDEBUG
1818
// you can remove this if you are *sure* you really want to run with asserts enabled.. It is SLOW.
19-
# error "Performance will be severely harmed if you don't predefine the standard macro NDEBUG."
19+
# error "Performance will be severely harmed if you don't define the standard macro NDEBUG."
2020
#endif
2121

2222

include/hurchalla/factoring/detail/impl_factorize.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ namespace hurchalla { namespace detail {
2626

2727
// Do *NOT* change the values for the macro in the code immediately below. If
2828
// you wish to use a different value for the macro (which is fine), please
29-
// predefine the macro when compiling.
29+
// define the macro when compiling.
3030
#ifndef HURCHALLA_FACTORING_ECM_THRESHOLD_BITS
3131
# define HURCHALLA_FACTORING_ECM_THRESHOLD_BITS 34
3232
#endif

include/hurchalla/factoring/detail/impl_greatest_common_divisor.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,9 @@ struct impl_greatest_common_divisor {
2525
// The binary GCD algorithm is usually considerably faster than the Euclidean
2626
// GCD algorithm. However for native types T, some new CPUs have very fast
2727
// dividers that potentially could make the Euclidean GCD implementation faster
28-
// than the Binary GCD implementation. You can predefine the macro
28+
// than the Binary GCD implementation. You can define the macro
2929
// HURCHALLA_PREFER_EUCLIDEAN_GCD in such a case. You will also need to make
30-
// sure that you have predefined HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE.
30+
// sure that you have defined HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE.
3131

3232
#if defined(HURCHALLA_PREFER_EUCLIDEAN_GCD) && \
3333
defined(HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE)

include/hurchalla/factoring/detail/trial_divide_mayer.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ namespace hurchalla { namespace detail {
2323

2424

2525
#if defined(HURCHALLA_USE_TRIAL_DIVIDE_VIA_INVERSE)
26-
# error "HURCHALLA_USE_TRIAL_DIVIDE_VIA_INVERSE must not be predefined"
26+
# error "HURCHALLA_USE_TRIAL_DIVIDE_VIA_INVERSE must not be defined"
2727
#endif
2828
#if defined(HURCHALLA_TARGET_ISA_HAS_NO_DIVIDE) || \
2929
!defined(HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE)

macros_for_performance.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11

2-
Optional macros to predefine to tune performance
3-
------------------------------------------------
4-
There are a number of macros you can optionally predefine to tune the
2+
Optional macros you can define to tune performance
3+
--------------------------------------------------
4+
There are a number of macros you can optionally define to tune the
55
performance on your system for the factoring and primality testing functions.
6-
You would predefine one or more of these macros when compiling *your* sources,
6+
You would define one or more of these macros when compiling *your* sources,
77
given that this is a header-only library.
88

99
For example, if you are compiling using clang or gcc from the command line, you would
@@ -28,15 +28,15 @@ HURCHALLA_TRIAL_DIVISION_SIZE - this macro specifies the number of small primes
2828
division stage of factoring. The default is 139.
2929

3030
HURCHALLA_TRIAL_DIVISION_TEMPLATE - this is the name of the template that
31-
performs the trial division during factoring; if you predefine it, you must set it to either
31+
performs the trial division during factoring; if you define it, you must set it to either
3232
PrimeTrialDivisionWarren or PrimeTrialDivisionMayer. PrimeTrialDivisionWarren
3333
is the default. If you have a CPU with very fast division instructions
3434
(some CPUs from 2019 or later), you might be able to improve performance by
35-
predefining this macro to PrimeTrialDivisionMayer while also predefining the
35+
defining this macro to PrimeTrialDivisionMayer while also defining the
3636
macro HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE. If you need to minimize resource
37-
usage, then predefine this macro to PrimeTrialDivisionMayer, since the Mayer
37+
usage, then define this macro to PrimeTrialDivisionMayer, since the Mayer
3838
template uses about a fifth of the memory of PrimeTrialDivisionWarren. If you
39-
do predefine this macro, you will very likely also want to predefine
39+
do define this macro, you will very likely also want to define
4040
HURCHALLA_TRIAL_DIVISION_SIZE to a new value that works best with this macro choice.
4141

4242
HURCHALLA_POLLARD_RHO_TRIAL_FUNCTOR_NAME - the name of the algorithm/functor
@@ -52,47 +52,47 @@ Macros for is_prime():
5252
HURCHALLA_ISPRIME_TRIALDIV_SIZE - the number of small primes (starting
5353
at 2,3,5,7, etc) that will be trialed as potential factors via trial division
5454
before testing primality with Miller-Rabin. 21 is the default. You can
55-
predefine this macro to a different number that is better tuned for your system.
55+
define this macro to a different number that is better tuned for your system.
5656
\
5757
\
5858
Macros for IsPrimeIntensive:
5959

6060
HURCHALLA_ISPRIME_INTENSIVE_TRIALDIV_SIZE - the number of small primes
6161
(starting at 2,3,5,7, etc) that will be trialed as potential factors via trial
6262
division before testing primality with Miller-Rabin. 75 is the default.
63-
You can predefine this macro to a different number that is better tuned for your
63+
You can define this macro to a different number that is better tuned for your
6464
system. Note that for IsPrimeIntensive, when testing primality of uint32_t and
6565
smaller types, this macro has no effect because IsPrimeIntensive tests primality
6666
of those types solely using the Sieve of Eratosthenes.
6767

6868
HURCHALLA_ISPRIME_INTENSIVE_TRIALDIV_TYPE - the name of the template
69-
that performs the trial division; if you predefine it, you must set it to either
69+
that performs the trial division; if you define it, you must set it to either
7070
PrimeTrialDivisionWarren or PrimeTrialDivisionMayer. PrimeTrialDivisionWarren
7171
is the default. If you have a CPU with very fast division instructions (some
7272
CPUs from 2019 or later), you might be able to improve performance by
73-
predefining this macro to PrimeTrialDivisionMayer while also predefining the
73+
defining this macro to PrimeTrialDivisionMayer while also defining the
7474
macro HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE. If you need to minimize resource
75-
usage, then predefine this macro to PrimeTrialDivisionMayer, since the Mayer
75+
usage, then define this macro to PrimeTrialDivisionMayer, since the Mayer
7676
template uses about a fifth of the memory of PrimeTrialDivisionWarren. If you
77-
do predefine this macro, you will very likely also want to predefine
77+
do define this macro, you will very likely also want to define
7878
HURCHALLA_ISPRIME_INTENSIVE_TRIALDIV_SIZE to a new value that works well with
7979
this macro choice.
8080
\
8181
\
8282
Miscellaneous macros:
8383

84-
HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE - predefine this macro if your CPU has very
84+
HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE - define this macro if your CPU has very
8585
fast division instructions (usually this is present only in CPUs from around
8686
2019 or later).
8787

88-
HURCHALLA_TARGET_ISA_HAS_NO_DIVIDE - you should usually predefine this macro if
88+
HURCHALLA_TARGET_ISA_HAS_NO_DIVIDE - you should usually define this macro if
8989
your microprocessor lacks a division instruction.
9090

91-
HURCHALLA_FACTORIZE_NEVER_USE_MONTGOMERY_MATH - if predefined, this macro
91+
HURCHALLA_FACTORIZE_NEVER_USE_MONTGOMERY_MATH - if defined, this macro
9292
causes the factorization functions to use standard division for modular
9393
arithmetic, instead of montgomery arithmetic. By default this macro is not
9494
defined. If you have a CPU with very fast division instructions, you might be
95-
able to improve performance by predefining this macro while also predefining the
95+
able to improve performance by defining this macro while also defining the
9696
macro HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE.
9797

9898
HURCHALLA_PRB_GCD_THRESHOLD - the maximum number of iterations of
@@ -112,19 +112,19 @@ the nonperiodic segment while doing minimal work. This increases performance,
112112
so long as the 'starting_length' is not too much greater than the actual
113113
nonperiodic segment's length. Note: if you expect to have only large factors,
114114
you will generally be able to improve the performance of Pollard-Rho-Brent
115-
factoring by predefining this macro to a larger value than the default, because
115+
factoring by defining this macro to a larger value than the default, because
116116
both the periodic and nonperiodic segments will tend to be large when you have
117117
large factors.
118118

119-
HURCHALLA_PREFER_EUCLIDEAN_GCD - predefining this macro allows you to use the
119+
HURCHALLA_PREFER_EUCLIDEAN_GCD - defining this macro allows you to use the
120120
Euclidean algorithm for greatest common divisor (GCD), rather tham the default
121121
Binary/Stein algorithm. However, you will only get the Euclidean GCD if you
122-
also predefine HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE. Note that even with both
123-
macros predefined, the Binary GCD will still be used for any integer type T that
122+
also define HURCHALLA_TARGET_CPU_HAS_FAST_DIVIDE. Note that even with both
123+
macros defined, the Binary GCD will still be used for any integer type T that
124124
is larger than the CPU's native bit width.
125125

126-
HURCHALLA_FACTORING_DISALLOW_INLINE_ASM - predefining this macro will prevent
127-
any inline asm from being compiled. If this macro is not predefined, this
126+
HURCHALLA_FACTORING_DISALLOW_INLINE_ASM - defining this macro will prevent
127+
any inline asm from being compiled. If this macro is not defined, this
128128
library by default will use inline asm for improved performance. However,
129129
inline asm is difficult to thoroughly test, because the code surrounding the
130130
inline asm under test in part determines what binary instructions are generated

0 commit comments

Comments
 (0)