Alternative field_element_mul design

Let me propose an design for `field_element_mul` that I think may yield better costs.

### Step 1: `full_multiply_128`

`full_multiply` is designed to be used recursively to produce multiplication such as

    full_multiply_128(a: (u64, u64), b: (u64, u64), c: (u64, u64), d: (u64, u64)) -> ((u64, u64), (u64, u64))

that returns a 256-bit value (note the nesting of the product types).
For reference the specification of `full_multiply(a,b,c,d)` is that it returns `a` × `b` + `c` + `d`.

The implementation of full_multiply_128 is roughly

 ```
let (hi_a, lo_a) =  a;
let (hi_b, lo_b) =  b;
let (hi_c, lo_c) =  c;
let (hi_d, lo_d) =  d;
let (hi_1, lo_1) =  full_multiply_64(lo_a, lo_b, lo_c, lo_d);
let (hi_2, lo_2) =  full_multiply_64(lo_a, hi_b, hi_c, hi_d);
let (hi_3, lo_3) =  full_multiply_64(hi_a, lo_b, hi_1, lo_2);
let hi_res =  full_multiply_64(hi_a, hi_b, hi_2, hi_3);
return (hi_res, (lo_3, lo_1)).
```

### Step 2: `full_multiply_256`

Using the same technique in Step 1, implement  `full_multiply_256`, returning a 512-bit value, by using four calls to `full_multiply_128`.

Alternatively you can just implement `multiply_256` since you won't actually be needing the full multiply.

### Step 3: `div_mod_256_128`

Like `full_multiply`, the `div_mod_128_64` jet is intended to be recursively use to create quotient-remainder computations on wider and wider  integer sizes.  There are some pre-conditions on this jet, e.g. `div_mod_128_64((ahi, alo), b)` requires that the highest bit of `b` is set and that `b < ahi`, but these won't really cause too many problems.

`div_mod_256_128` divides a 256-bit number by a 128-bit number returning a 128-bit quotient and a 128-bit remainder, subject to the same preconditions that `div_mod_128_64` has. `div_mod_256_128` is actually defined in terms of `div_mod_192_128` which returns a 64-bit quotient and a 128-bit remainder.  

The algorithm for `div_mod_256_128` in terms of `div_mod_192_128` is to do two steps of long division.  If we want to divide `AAAA` by `BB`, where `A` and `B` represent 2⁶⁴ bit "digits", We perform one call to `div_mod_192_128` to get the high "digit" of the quotient and an intermediate remainder.

     * 1.       Q
     *      ______
     *    BB) AAAA
     *        XXX
     *        ---
     *         RR

Then we "bring down" the last "digit" of A next to our remainder and call `div_mod_192_128` to get the low "digit" of the quotient and the final remainder.

     * 2.       QQ
     *      ______
     *    BB) AAA|
     *        XXX|
     *        ---|
     *         RRA
     *         XXX
     *         ---
     *          RR
     *

Now we have to implement `div_mod_192_128` in terms of the existing `div_mod_128_64` jet.  Using the language from above we want to divide `AAA` by `BB`. Again we have preconditions that require the high bit of `BB` is set and `AAA` is less than `BB0`.

The idea here is to "guess" what the 64-bit quotient is by dividing the high two "digits" in A by the high "digit" in B using `div_mod_128_64`.  For math reasons, this guess will be correct within 2 of the true quotient.  We use our guessed quotient to compute a remainder, and if the remainder would be negative, we adjust our guessed quotient and repeat until our remainder comes out positive.

This algorithm is based on the paper "Fast Recursive Division" by Christoph Burnikel and Joachim Ziegler, MPI-I-98-1-022, Oct. 1998.  You can also find some details in the Simplicity code comments at

https://github.com/BlockstreamResearch/simplicity/blob/d27b7e6f6b4e41f5f1725b102f8f80edc04a13ba/C/jets.c#L1046-L1203

I also have a implementation of these wide `div_mod` functions in raw Simplicity at

https://github.com/BlockstreamResearch/simplicity/blob/master/Haskell/Core/Simplicity/Programs/Arith.hs#L240-L282

however, that code may not be decipherable.

### Step 4: `div_mod_512_256`

This step is exactly like in Step 3, except we again double the width of our arguments.

Again you may be able to simplify things a bit by only implementing `mod_512_256`, since you don't actually need the quotient.

### Step 5: `field_element_mul`

Now we have everything in place to implement `field_element_mul`.  We use `full_multiply_256` (or `multiply_256`) to multiply two 256-bit field elements to produce a 512-bit result.  Then we can use `div_mod_512_256` to take our 512-bit result and reduce it modulo the required 256-modulus.

Our `div_mod_512_256` above has a preconditions that requires that the high bits of `a` to be less than `b`; however, the product of already (partially) normalized field elements ought to satisfy this requirement.

There is another more sticky issue is that the BN-254 modulus `M` doesn't actually have the high bit set as required by `div_mod_512_256`.  There are probably a couple of solutions to this problem.  My suggestion would be to actually work with only partially reduced values as much as you can get away with.  That is work modulo `3M` instead.  You can write a specialized normalization function to reduce to `M` by subtraction whenever it is needed, but hopefully that won't be required very much.
  
### Epilogue

I will say that I would like SimplicityHL language to eventually provide "synthetic" wide integer types along with these sorts of arithmetic operations, but it will take some time before we get there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alternative field_element_mul design #4

Step 1: `full_multiply_128`

Step 2: `full_multiply_256`

Step 3: `div_mod_256_128`

Step 4: `div_mod_512_256`

Step 5: `field_element_mul`

Epilogue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Alternative field_element_mul design #4

Description

Step 1: full_multiply_128

Step 2: full_multiply_256

Step 3: div_mod_256_128

Step 4: div_mod_512_256

Step 5: field_element_mul

Epilogue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Step 1: `full_multiply_128`

Step 2: `full_multiply_256`

Step 3: `div_mod_256_128`

Step 4: `div_mod_512_256`

Step 5: `field_element_mul`