generated from openproblems-bio/task_template
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Harmonypy's kmeans clustering ran into error when processing CLL dataset:
/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/harmonypy/harmony.py:145: RuntimeWarning: invalid value encountered in divide
self.Z_cos = self.Z_orig / self.Z_orig.max(axis=0)
2025-10-13 11:45:35,217 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/harmonypy/harmony.py", line 127, in run_harmony
ho = Harmony(
^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/harmonypy/harmony.py", line 178, in __init__
self.init_cluster(cluster_fn)
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/harmonypy/harmony.py", line 204, in init_cluster
self.Y = cluster_fn(self.Z_cos.T, self.K).T
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/harmonypy/harmony.py", line 198, in _cluster_kmeans
model.fit(data)
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/sklearn/base.py", line 1389, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/sklearn/cluster/_kmeans.py", line 1454, in fit
X = validate_data(
^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/sklearn/utils/validation.py", line 2944, in validate_data
out = check_array(X, input_name="X", **check_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/sklearn/utils/validation.py", line 1107, in check_array
_assert_all_finite(
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/sklearn/utils/validation.py", line 120, in _assert_all_finite
_assert_all_finite_element_wise(
File "/opt/homebrew/Caskroom/miniconda/base/envs/single_cell/lib/python3.12/site-packages/sklearn/utils/validation.py", line 169, in _assert_all_finite_element_wise
raise ValueError(msg_err)
ValueError: Input X contains NaN.
KMeans does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values
I think KMeans has numerical stability issues with exact zeros because the error goes away if we increase the marker values by 1e-20.
Not sure how this will impact the results. It should be minimum as we add the small values uniformly across all cells and markers.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working