It is now super easy with numba `njit(parallel=True)` and `prange` to do parallel execution. Do this where it makes sense and performance is an issue.