You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`KMedoids`:K-medoids is a clustering algorithm that works by finding k data points (called
242
-
medoids) such that the total distance between each data point and the closest medoid is
243
-
minimal. The function implements a K-means style algorithm instead of PAM (Partitioning
244
-
Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to
245
-
produce worse (10-20% higher total costs) results (see e.g. (https://juliastats.org/Clustering.jl/latest/kmedoids.html#kmedoid_refs-1)[Schubert & Rousseeuw (2019)]).
240
+
`KMedoids`: The K-Medoids algorithm finds K centroids corresponding to K clusters in the
241
+
data. Unlike K-Means, the centroids are found among data points themselves. Clusters
242
+
are not assumed to be elliptical. Should be used with a non-euclidean distance metric
246
243
247
244
# Training data
248
245
@@ -253,7 +250,7 @@ In MLJ or MLJBase, bind an instance `model` to data with
253
250
Where
254
251
255
252
- `X`: is any table of input features (eg, a `DataFrame`) whose columns
256
-
are of scitype `Continuous`; check the column scitypes with `schema(X)`
253
+
are of scitype `Continuous`; check the scitype with `schema(X)`
257
254
258
255
- `y`: is the target, which can be any `AbstractVector` whose element
259
256
scitype is `Count`; check the scitype with `schema(y)`
@@ -263,15 +260,15 @@ Train the machine using `fit!(mach, rows=...)`.
263
260
# Hyper-parameters
264
261
265
262
- `k=3`: The number of centroids to use in clustering.
266
-
- `metric::Distances.SqEuclidean`: The metric used to calculate the clustering distance
267
-
matrix. Must be a subtype of `Distances.SemiMetric` from Distances.jl.
263
+
- `metric::SemiMetric=SqEuclidean`: The metric used to calculate the clustering distance
264
+
matrix
268
265
269
266
# Operations
270
267
271
-
- `predict(mach, Xnew)`: return learned cluster labels for a new
272
-
table of inputs `Xnew` having the same scitype as `X` above.
268
+
- `predict(mach, Xnew)`: return predictions of the target given new
269
+
features `Xnew` having the same Scitype as `X` above.
273
270
- `transform(mach, Xnew)`: instead return the mean pairwise distances from
274
-
new samples to the cluster centers.
271
+
new samples to the cluster centers
275
272
276
273
# Fitted parameters
277
274
@@ -291,20 +288,25 @@ The fields of `report(mach)` are:
0 commit comments