Skip to content

Further NTT optimizations #1011

@divergentdave

Description

@divergentdave

Runtime of Prio3 is largely dominated by NTT, INTT, and ancillary field multiplications. We could benefit from adapting techniques from the literature for implementing the NTT over smaller fields for Ring-LWE cryptosystems (and in turn from digital signal processing literature on efficient FFT implementation).

I skimmed over Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography, and there were a few ideas from this paper and previous works it cites that could be applicable. For example, one could make the forward transform return its results in bit-reversed order, and have the inverse transform take its input in bit-reversed order. We could probably adopt this for polynomial multiplication, in order to skip two bit reversed reordering steps. Other uses of the NTT may need to be adapted to this change. We can't use the loose bound/tight bound representations of field elements, (in order to skip modular reduction after additions) because our field modulus is already too close to machine word sizes.

There was also a talk at RWC 2024 on RISC V implementations of cryptographic algorithms, (slides, recording) this included a section on applying autovectorization to NTT implementations. I'm not sure if our field multiplications would be too long for the compiler to see through, but the LLVM-related diagnostic tooling might be generally applicable to our case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions