Skip to content

Commit 44ac430

Browse files
TimotheeMathieurth
andauthored
FIX initialization build on kmedoids (#91)
Co-authored-by: Roman Yurchak <[email protected]>
1 parent 355ed13 commit 44ac430

File tree

3 files changed

+25
-6
lines changed

3 files changed

+25
-6
lines changed

.circleci/config.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,10 @@ version: 2
33
jobs:
44
documentation:
55
docker:
6-
- image: circleci/python:3.7.6
6+
- image: circleci/python:3.8
7+
environment:
8+
- OMP_NUM_THREADS: 2
9+
- MKL_NUM_THREADS: 2
710
steps:
811
- checkout
912
- run:
@@ -12,7 +15,7 @@ jobs:
1215
chmod +x miniconda.sh && ./miniconda.sh -b -p ~/miniconda
1316
export PATH="~/miniconda/bin:$PATH"
1417
conda update --yes --quiet conda
15-
conda create -n testenv --yes --quiet python=3.7
18+
conda create -n testenv --yes --quiet python=3.8
1619
source activate testenv
1720
pip install ".[docs]"
1821
cd doc

sklearn_extra/cluster/_k_medoids_helper.pyx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ def _build( floating[:, :] D, int n_clusters):
7474

7575
cdef int[:] medoid_idxs = np.zeros(n_clusters, dtype = np.intc)
7676
cdef int sample_size = len(D)
77-
cdef int[:] not_medoid_idxs = np.zeros(sample_size, dtype = np.intc)
77+
cdef int[:] not_medoid_idxs = np.arange(sample_size, dtype = np.intc)
7878
cdef int i, j, id_i, id_j
7979

8080
medoid_idxs[0] = np.argmin(np.sum(D,axis=0))
@@ -84,7 +84,7 @@ def _build( floating[:, :] D, int n_clusters):
8484

8585
cdef floating[:] Dj = D[medoid_idxs[0]].copy()
8686
cdef floating cost_change
87-
cdef (int, int) new_medoid = (medoid_idxs[0], 0)
87+
cdef (int, int) new_medoid = (0,0)
8888
cdef floating cost_change_max
8989

9090
for _ in range(n_clusters -1):

sklearn_extra/cluster/tests/test_k_medoids.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,10 @@
55
from scipy.sparse import csc_matrix
66
import pytest
77

8-
from sklearn.datasets import load_iris
8+
from sklearn.datasets import load_iris, fetch_20newsgroups_vectorized
99
from sklearn.metrics.pairwise import PAIRWISE_DISTANCE_FUNCTIONS
10-
from sklearn.metrics.pairwise import euclidean_distances
10+
from sklearn.metrics.pairwise import euclidean_distances, cosine_distances
11+
1112
from numpy.testing import assert_allclose, assert_array_equal
1213

1314
from sklearn_extra.cluster import KMedoids
@@ -331,3 +332,18 @@ def test_kmedoids_on_sparse_input():
331332
labels = model.fit_predict(X)
332333
assert len(labels) == 2
333334
assert_array_equal(labels, model.labels_)
335+
336+
337+
# Test the build initialization.
338+
def test_build():
339+
X, y = fetch_20newsgroups_vectorized(return_X_y=True)
340+
# Select only the first 1000 samples
341+
X = X[:500]
342+
y = y[:500]
343+
# Precompute cosine distance matrix
344+
diss = cosine_distances(X)
345+
# run build
346+
ske = KMedoids(20, "precomputed", init="build", max_iter=0)
347+
ske.fit(diss)
348+
assert ske.inertia_ <= 230
349+
assert len(np.unique(ske.labels_)) == 20

0 commit comments

Comments
 (0)