As discussed with @Rcasan, parallelisation should happen on the grouping-level. This will also reduce the in-memory usage of the function.