KMeans fit/predict and centroid accumulation are mostly serial; parallelize training stages to cut build/train wall time on multi-core CPUs.