Conversation
dannys4
left a comment
There was a problem hiding this comment.
To be honest, kind of weird that plusi is removed and minusi is kept, as you multiply it by the negative when calculating xoe_m_xoo and ỹ_koe_m_ỹ_koo, but changing this doesn't substantially change performance as measured on my personal machine, so I don't think it matters too much.
I also looked into the removal of the @muladd. Since most of the muladds are int-ops, the compiler should be able to optimize this no problem. The fp muladds that were in 220-228, I would hope, would give better performance (without fastmath, the compiler shouldn't be able to reorder your operations to perform these FMAs), but it seems this is not the case according to the experiments.
None of these are actionable items, it's just premature optimizations on my part, and I wanted to put this in writing for "posterity".
Indeed, but I initially got the sign wrong and then added the Regarding the |
It's a little silly that we have to do so much work that should really be handled by the compiler but these changes give me 25% speedup for a size 256 problem
Current main
This PR