Sensible batch multiplier

WrathfulSpatula · WrathfulSpatula · commit f5a220a32f5f · 2025-01-05T15:57:11.000-05:00
diff --git a/FindAFactor/find_a_factor.py b/FindAFactor/find_a_factor.py
@@ -8,7 +8,7 @@ def find_a_factor(n,
                   gear_factorization_level=int(os.environ.get('FINDAFACTOR_GEAR_FACTORIZATION_LEVEL')) if os.environ.get('FINDAFACTOR_GEAR_FACTORIZATION_LEVEL') else 11,
                   wheel_factorization_level=int(os.environ.get('FINDAFACTOR_WHEEL_FACTORIZATION_LEVEL')) if os.environ.get('FINDAFACTOR_WHEEL_FACTORIZATION_LEVEL') else 5,
                   thread_count=int(os.environ.get('FINDAFACTOR_THREAD_COUNT')) if os.environ.get('FINDAFACTOR_THREAD_COUNT') else 0,
-                  batch_multiplier=int(os.environ.get('FINDAFACTOR_BATCH_MULTIPLIER')) if os.environ.get('FINDAFACTOR_BATCH_MULTIPLIER') else 256,
+                  batch_multiplier=float(os.environ.get('FINDAFACTOR_BATCH_MULTIPLIER')) if os.environ.get('FINDAFACTOR_BATCH_MULTIPLIER') else 3.0,
                   smoothness_bound_multiplier=float(os.environ.get('FINDAFACTOR_SMOOTHNESS_BOUND_MULTIPLIER')) if os.environ.get('FINDAFACTOR_SMOOTHNESS_BOUND_MULTIPLIER') else 1.0):
     return int(_find_a_factor._find_a_factor(str(n),
                                              use_congruence_of_squares,
diff --git a/README.md b/README.md
@@ -32,7 +32,7 @@ factor = find_a_factor(
     gear_factorization_level=11,
     wheel_factorization_level=5,
     thread_count=0,
-    batch_multiplier=256,
+    batch_multiplier=3.0,
     smoothness_bound_multiplier=1.0
 )
 ```
@@ -45,7 +45,7 @@ The `find_a_factor()` function should return any nontrivial factor of `to_factor
 - `gear_factorization_level` (default value: `11`): This is the value up to which "wheel (and gear) factorization" and trial division are used to check factors and optimize "brute force," in general. The default value of `11` includes all prime factors of `11` and below and works well in general, though significantly higher might be preferred in certain cases.
 - `wheel_factorization_level` (default value: `5`): "Wheel" vs. "gear" factorization balances two types of factorization wheel ("wheel" vs. "gear" design) that often work best when the "wheel" is only a few prime number levels lower than gear factorization. Optimized implementation for wheels is only available up to `13`. The primes above "wheel" level, up to "gear" level, are the primes used specifically for "gear" factorization.
 - `thread_count` (default value: `0` for auto): Control the number of threads used for separate Gaussian elimination or parallel brute-force instances. For value of `0`, the total number of hyper threads on the system will be detedted and used. When `use_congruence_of_squares=True`, this acts as a multiplier on overall memory usage. If you exceed system memory, turn it down to some manual value. (Gaussian elimination is not easily parallelizable, except to run as many separate instances as will fit in memory.)
-- `batch_multiplier` (default value: `256`): controls how many items are processed in a batch before Gaussian elimination. `batch_multiplier` times the number of "smooth" primes is the batch size for "semi-smooth" numbers, to be collected before sieving and then Gaussian elimination. Besides thread count, this `batch_multiplier` can help tune overall memory usage and multiprocessor utilization.
+- `batch_multiplier` (default value: `3.0`): controls how many items are processed in a batch before Gaussian elimination. `batch_multiplier` times the number of "smooth" primes is the batch size for "semi-smooth" numbers, to be collected before sieving and then Gaussian elimination. Besides thread count, this `batch_multiplier` can help tune overall memory usage and multiprocessor utilization.
 - `smoothness_bound_multiplier` (default value: `1.0`): starting with the first prime number after wheel factorization, the congruence of squares approach (with Quadratic Sieve) takes a default "smoothness bound" with as many distinct prime numbers as bits in the number to factor (for default argument of `1.0` multiplier). To increase or decrease this number, consider it multiplied by the value of `smoothness_bound_multiplier`.
 
 All variables defaults can also be controlled by environment variables:
@@ -62,7 +62,7 @@ The developer anticipates this single-function set of parameters, as API, is the
 
 Advantage for `use_congruence_of_squares` is beyond the hardware scale of the developer's experiments, in practicality, but it can be shown to work correctly (at disadvantage, at small factoring bit-width scales). The anticipated use case is to turn this option on when approaching the size of modern-day RSA semiprimes in use.
 
-If this is your use case, you want to specifically consider `smoothness_bound_multiplier`, `batch_multiplier`, and `thread_count`. By default, as many primes are kept for "smooth" number sieving as bits in the number to factor. This is multiplied by `smooth_bound_multiplier` (and cast to a discrete number of primes in total). `batch_multiplier` is how many times this count of primes, after `smooth_bound_multiplier`, is multiplied for "smooth number part" batching. Turning this down uses less memory and gets to Gaussian elimination faster but decreases CPU utilization. However, the higher `batch_multiplier` is set for CPU utilization, the higher the memory used is.
+If this is your use case, you want to specifically consider `smoothness_bound_multiplier`, `batch_multiplier`, and `thread_count`. By default, as many primes are kept for "smooth" number sieving as bits in the number to factor. This is multiplied by `smooth_bound_multiplier` (and cast to a discrete number of primes in total). `batch_multiplier` is how many times this count of primes, after `smooth_bound_multiplier`, is multiplied for "smooth number part" batching. Turning this down uses less memory and gets to Gaussian elimination faster but might or might not decrease CPU utilization. However, the higher `batch_multiplier` is set, as to maximize CPU utilization, the higher the memory used is.
 
 Hence, you only want to set a manual `thread_count` below default to recover full CPU utilization within available system memory footprint. Ideally, you don't want to _have_ to change thread_count from `0`/default, indicating to automatically use all hyper threads, but keeping full utilization depends on both available system memory footprint and the scale of the number to factor, inherently.
 
diff --git a/find_a_factor.py b/find_a_factor.py
@@ -18,6 +18,7 @@ def main():
     gear_factorization_level = int(os.environ.get('FINDAFACTOR_GEAR_FACTORIZATION_LEVEL')) if os.environ.get('FINDAFACTOR_GEAR_FACTORIZATION_LEVEL') else 11
     wheel_factorization_level = int(os.environ.get('FINDAFACTOR_WHEEL_FACTORIZATION_LEVEL')) if os.environ.get('FINDAFACTOR_WHEEL_FACTORIZATION_LEVEL') else 5
     thread_count=int(os.environ.get('FINDAFACTOR_THREAD_COUNT')) if os.environ.get('FINDAFACTOR_THREAD_COUNT') else 0
+    batch_multiplier=float(os.environ.get('FINDAFACTOR_BATCH_MULTIPLIER')) if os.environ.get('FINDAFACTOR_BATCH_MULTIPLIER') else 3.0
     smoothness_bound_multiplier = float(os.environ.get('FINDAFACTOR_SMOOTHNESS_BOUND_MULTIPLIER')) if os.environ.get('FINDAFACTOR_SMOOTHNESS_BOUND_MULTIPLIER') else 1.0
 
     if argv_len > 2:
@@ -32,7 +33,9 @@ def main():
     if argv_len > 7:
         thread_count = int(sys.argv[7])
     if argv_len > 8:
-        smoothness_bound_multiplier = float(sys.argv[8])
+        batch_multiplier = float(sys.argv[8])
+    if argv_len > 9:
+        smoothness_bound_multiplier = float(sys.argv[9])
 
     start = time.perf_counter()
     result = find_a_factor(
@@ -43,6 +46,7 @@ def main():
         gear_factorization_level = gear_factorization_level,
         wheel_factorization_level = wheel_factorization_level,
         thread_count = thread_count,
+        batch_multiplier = batch_multiplier,
         smoothness_bound_multiplier = smoothness_bound_multiplier
     )
     print(time.perf_counter() - start)