After discussion with @wcwitt I'd like to bring the symmetry reductions from the PACE paper to this code. From initial experiment this seems to give about a factor 1.5 speed improvement. Together with the new kernels bit by bit it becomes worthwhile.
- Question 1: is it worth it or do we go straight for real Ylms?
- Question 2: How do we do this carefully to maintain 100% compatibility of the models?