Skip to content

Neon #80

@fbarchard

Description

@fbarchard

kissfft on ARMv7 with fixed point is slow. A neon version would improve performance quite a bit.

  1. there arent enough registers, so the stack is used
  2. shifts and rounding add a lot of overhead

The float version has neither of those issues. The main loop of bfly4 is 68 instructions for float vs 150 for 16 bit fixed point.

  1. Neon could also process more than 1 value at a time.
    kiss_fft_cpx has 2 values, and bfly loops process more than 1 in places.
    C_MUL(scratch[0],Fout1 , *tw1 );
    C_MUL(scratch[1],Fout2 , *tw2 );

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions