-
Notifications
You must be signed in to change notification settings - Fork 347
Open
Description
kissfft on ARMv7 with fixed point is slow. A neon version would improve performance quite a bit.
- there arent enough registers, so the stack is used
- shifts and rounding add a lot of overhead
The float version has neither of those issues. The main loop of bfly4 is 68 instructions for float vs 150 for 16 bit fixed point.
- Neon could also process more than 1 value at a time.
kiss_fft_cpx has 2 values, and bfly loops process more than 1 in places.
C_MUL(scratch[0],Fout1 , *tw1 );
C_MUL(scratch[1],Fout2 , *tw2 );
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels