-
Notifications
You must be signed in to change notification settings - Fork 124
Open
Labels
Description
In the docs (below), the kmeans
algorithm takes a matrix where each column X[:, i] corresponds to an observed sample. This implementation goes against the idea of tidy data as well as differs from Python's scikit-learn implementation of kmeans and R's base implementation of kmeans.
Is there a good reason for this? Should this algorithm be changed from column-oriented to row-oriented so as to be consistent with R and Python as well as with the concept of tidy data?
URL: http://clusteringjl.readthedocs.io/en/stable/overview.html
Inputs
A clustering algorithm, depending on its nature, may accept an input matrix in either of the following forms:
- Sample matrix X, where each column X[:,i] corresponds to an observed sample.
- Distance matrix D, where D[i,j] indicates the distance between samples i and j, or the cost of assigning one to the other.