Skip to content

Alternative field_element_mul design #4

@roconnor-blockstream

Description

@roconnor-blockstream

Let me propose an design for field_element_mul that I think may yield better costs.

Step 1: full_multiply_128

full_multiply is designed to be used recursively to produce multiplication such as

full_multiply_128(a: (u64, u64), b: (u64, u64), c: (u64, u64), d: (u64, u64)) -> ((u64, u64), (u64, u64))

that returns a 256-bit value (note the nesting of the product types).
For reference the specification of full_multiply(a,b,c,d) is that it returns a × b + c + d.

The implementation of full_multiply_128 is roughly

let (hi_a, lo_a) =  a;
let (hi_b, lo_b) =  b;
let (hi_c, lo_c) =  c;
let (hi_d, lo_d) =  d;
let (hi_1, lo_1) =  full_multiply_64(lo_a, lo_b, lo_c, lo_d);
let (hi_2, lo_2) =  full_multiply_64(lo_a, hi_b, hi_c, hi_d);
let (hi_3, lo_3) =  full_multiply_64(hi_a, lo_b, hi_1, lo_2);
let hi_res =  full_multiply_64(hi_a, hi_b, hi_2, hi_3);
return (hi_res, (lo_3, lo_1)).

Step 2: full_multiply_256

Using the same technique in Step 1, implement full_multiply_256, returning a 512-bit value, by using four calls to full_multiply_128.

Alternatively you can just implement multiply_256 since you won't actually be needing the full multiply.

Step 3: div_mod_256_128

Like full_multiply, the div_mod_128_64 jet is intended to be recursively use to create quotient-remainder computations on wider and wider integer sizes. There are some pre-conditions on this jet, e.g. div_mod_128_64((ahi, alo), b) requires that the highest bit of b is set and that b < ahi, but these won't really cause too many problems.

div_mod_256_128 divides a 256-bit number by a 128-bit number returning a 128-bit quotient and a 128-bit remainder, subject to the same preconditions that div_mod_128_64 has. div_mod_256_128 is actually defined in terms of div_mod_192_128 which returns a 64-bit quotient and a 128-bit remainder.

The algorithm for div_mod_256_128 in terms of div_mod_192_128 is to do two steps of long division. If we want to divide AAAA by BB, where A and B represent 2⁶⁴ bit "digits", We perform one call to div_mod_192_128 to get the high "digit" of the quotient and an intermediate remainder.

 * 1.       Q
 *      ______
 *    BB) AAAA
 *        XXX
 *        ---
 *         RR

Then we "bring down" the last "digit" of A next to our remainder and call div_mod_192_128 to get the low "digit" of the quotient and the final remainder.

 * 2.       QQ
 *      ______
 *    BB) AAA|
 *        XXX|
 *        ---|
 *         RRA
 *         XXX
 *         ---
 *          RR
 *

Now we have to implement div_mod_192_128 in terms of the existing div_mod_128_64 jet. Using the language from above we want to divide AAA by BB. Again we have preconditions that require the high bit of BB is set and AAA is less than BB0.

The idea here is to "guess" what the 64-bit quotient is by dividing the high two "digits" in A by the high "digit" in B using div_mod_128_64. For math reasons, this guess will be correct within 2 of the true quotient. We use our guessed quotient to compute a remainder, and if the remainder would be negative, we adjust our guessed quotient and repeat until our remainder comes out positive.

This algorithm is based on the paper "Fast Recursive Division" by Christoph Burnikel and Joachim Ziegler, MPI-I-98-1-022, Oct. 1998. You can also find some details in the Simplicity code comments at

https://github.com/BlockstreamResearch/simplicity/blob/d27b7e6f6b4e41f5f1725b102f8f80edc04a13ba/C/jets.c#L1046-L1203

I also have a implementation of these wide div_mod functions in raw Simplicity at

https://github.com/BlockstreamResearch/simplicity/blob/master/Haskell/Core/Simplicity/Programs/Arith.hs#L240-L282

however, that code may not be decipherable.

Step 4: div_mod_512_256

This step is exactly like in Step 3, except we again double the width of our arguments.

Again you may be able to simplify things a bit by only implementing mod_512_256, since you don't actually need the quotient.

Step 5: field_element_mul

Now we have everything in place to implement field_element_mul. We use full_multiply_256 (or multiply_256) to multiply two 256-bit field elements to produce a 512-bit result. Then we can use div_mod_512_256 to take our 512-bit result and reduce it modulo the required 256-modulus.

Our div_mod_512_256 above has a preconditions that requires that the high bits of a to be less than b; however, the product of already (partially) normalized field elements ought to satisfy this requirement.

There is another more sticky issue is that the BN-254 modulus M doesn't actually have the high bit set as required by div_mod_512_256. There are probably a couple of solutions to this problem. My suggestion would be to actually work with only partially reduced values as much as you can get away with. That is work modulo 3M instead. You can write a specialized normalization function to reduce to M by subtraction whenever it is needed, but hopefully that won't be required very much.

Epilogue

I will say that I would like SimplicityHL language to eventually provide "synthetic" wide integer types along with these sorts of arithmetic operations, but it will take some time before we get there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions