I believe this can be a drop-in replacement for the FFT that is currently being used:
https://arxiv.org/pdf/1802.03932.pdf
More background:
http://www.texmacs.org/joris/ffft/ffft.pdf
~5x faster for 16-bit field, ~3x faster for 8-bit field hypothetically
catid/leopard#5