GitHub - basilegithub/Multiple-Polynomial-Quadratic-Sieve-Python: A Python implementation of the Quadratic sieve algorithm for factoring large integers

basilegithub / Multiple-Polynomial-Quadratic-Sieve-Python Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A Python implementation of the Quadratic sieve algorithm for factoring large integers

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
config		config
src		src
README.txt		README.txt
requirements.txt		requirements.txt

Repository files navigation

Thanks for looking at this project.

This is a project I started a few years ago, during the first half of 2019.
I re-worked it recently to merge some versions, make it more readable, add comments, and get all the sources I used.

##### Introduction ######

This project is a Python implementation of the Quadratic Sieve to factor integers.

Here are some detailed characteristics :
- Multiple polynomials used for sieving
- Double large prime variation used to collect relations
- Union-find algorithm to detect cycle in the partial relations graph
- Batch smoothness test and naive smoothness test available
- Parallel sieving allowed
- gaussian elimination, block Lanczos, Wiedemann algorithms available for the linear algebra step (no parallelization available for now)

##### Sources #####

Overall algorithm:
- "Prime numbers, a computational perspective" by Richard Crandall and Carl Pomerance (really good): https://link.springer.com/book/10.1007/0-387-28979-8
- "The Quadratic Sieve Factoring Algorithm" by Eric Landquist: https://www.cs.virginia.edu/crab/QFS_Simple.pdf

Multiple polynomials and double large primes
- "Factoring Integers with Large-Prime Variations of the Quadratic Sieve" by Henk Boender and Herman J. J. te Riele: https://projecteuclid.org/journals/experimental-mathematics/volume-5/issue-4/Factoring-integers-with-large-prime-variations-of-the-quadratic-sieve/em/1047565445.pdf

Batch smoothness test:
- "HOW TO FIND SMOOTH PARTS OF INTEGERS" by DANIEL J. BERNSTEIN: https://cr.yp.to/factorization/smoothparts-20040510.pdf

Gaussian elimination:
- Took the gaussian elimination code from https://github.com/skollmann/PyFactorise/blob/master/factorise.py#L39

Block Lanczos:
- "A Block Lanczos Algorithm for Finding Dependencies over GF(2)" by Peter L. Montgomery: https://scispace.com/pdf/a-block-lanczos-algorithm-for-finding-dependencies-over-gf-2-ezdu2qt0pp.pdf
- "A modified block Lanczos algorithm with fewer vectors" by Emmanuel Thomé (really good): https://eprint.iacr.org/2016/329.pdf

Wiedemann algorithm:
- "SOLVING HOMOGENEOUS LINEAR EQUATIONSOVER GF(2) VIA BLOCK WIEDEMANN ALGORITHM" by Don Coppersmith: https://www.ams.org/journals/mcom/1994-62-205/S0025-5718-1994-1192970-7/S0025-5718-1994-1192970-7.pdf

##### running the algortihm #####

When running the main.py file, the required argument is --n the number you want to factor.

A suitable command looks like "python main.py --n [your_number]"

##### config parameters #####

I have defined a few parameters directly you can edit in the config file.

- FLAG_BATCH_SMOOTH_TEST: If True, the batch smoothness test from Bernstein will be used. Otherwise, the naive (trial division from the prime factor base) approach will be used. Batch smoothness test is generally faster

- FLAG_GAUSSIAN_PIVOT: If True, gaussian elimination will ALWAYS be used in the linear algebra step. If you want to use block Lanczos of Wiedemann algorithms, this has to be set to False.

- FLAG_LANCZOS: If True, the block Lanczos algorithm will be used. If False, the Wiedemann algorithm will be used. No matter its value, if gaussian_pivot is True then gaussian elimination will be performed.

- LARGE_PRIME_CONST: Define the constant that will be multiplied by the last prime in the factor base to obtain the bound for the single large primes. For the double large primes, the bound is this constant multiplied by the last prime in the factor base squared.

- BLOCK_SIZE: If Block Lanczos or Wiedemann algorithm are used, this sets the block size. In the Wiedemann algorithm, this is only used to compute many matrix-vector products at once, and not to compute the matrix generator of the matrix sequence you obtain.

- NB_CPU: Sets the number of cpu used for sieving. One cpu is always kept as a "leader" that collects relations from the sievers, and the rest of the cpus sieve, test for smoothness, and send their candidate relations to the leader.

##### General discussion #####

I tried overall to use as few exernal libraries as possible. While this makes the code
way harder to write for me, and understand for you, I viewed it as a good challenge
to write good code, and get to know hard and foreign concepts to me. It took a lot
of work, but I got a bug free code, and got much more knowledgable on the subroutines
thanks to this.

I have a Python implementation of the GNFS algorithm on my Github.

Here are some points I consider working on at some point:

- No matter how hard I tried, I am stuck on understanding the block Wiedemann algorithm.
For now, the best I can do is the scalar one, with some optimizations. Namely, I
use binary encoding of the blocks of vectors to compute matrix-vector product very
efficiently. Then, for each scalar sequence, I compute its generator using scalar
Berlekamp-Massey algorithm, and use it to update the minimal polynomial of the matrix.
I consider working some way to compute the matrix generator of the matrix sequence
through some algorithm (matrix Berlekamp-Massey or some other).

- I have to make the linear algebra step more parallel. Even though it is the fastest of the
two main steps of the algorithm, no one likes to wait if it is possible to go faster.

- I had seen that there was some work on how to chose k such that k*n is much easier to factor.
If I remember correctly, you test multiple small values of k, and see for which
value of k, k*n has the most quadratic residues. You weight the value of the quadratic
residues by some function of p. I have yet to choose if I do it or not, as it implies
many modifications for a speedup that I don't have a good estimate for. This is akin
to the polynomial selection step of the GNFS.

Here are the next steps:

- Some optimizations may be possible here and there, but I have already a big amount of work
to get fast code. The next step on this topic is switching to C. Python is intrisically
limited on the topic of speed, as it is compiled on the fly. Just like this project, I
have already done part of the C project a while ago. I have to rework it to get
something on the same level of quality as this Python version.

- Likewise, implement the GNFS algorithm in C.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

basilegithub/Multiple-Polynomial-Quadratic-Sieve-Python

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages