Skip to content

Conversation

sergeyklay
Copy link

@sergeyklay sergeyklay commented Aug 18, 2025

The docstring for the dask_ml.cluster.KMeans class currently omits random as a valid option for the init parameter. However, this option is fully implemented in the underlying k_init function and serves as a critical scalable alternative to the default k-means||, which can overwhelm the scheduler on large datasets.

These changes make the documentation accurately reflect the implementation and help users make better-informed decisions when choosing an initialization strategy for large-scale clustering tasks.

Also covers #918

The docstring for the dask_ml.cluster.KMeans class currently omits 'random'
as a valid option for the init parameter. However, this option is fully
implemented in the underlying k_init function and serves as a critical
scalable alternative to the default 'k-means||', which can overwhelm the
scheduler on large datasets.
@sergeyklay
Copy link
Author

@TomAugspurger Could you please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant