|
| 1 | +k-Means |
| 2 | +======= |
| 3 | + |
| 4 | +Groups items using the k-Means clustering algorithm. |
| 5 | + |
| 6 | +Inputs |
| 7 | + Data |
| 8 | + input dataset |
| 9 | + |
| 10 | +Outputs |
| 11 | + Data |
| 12 | + dataset with cluster index as a class attribute |
| 13 | + Graph (if the Network addon is installed) |
| 14 | + the weighted k-nearest neighbor graph |
| 15 | + |
| 16 | + |
| 17 | +The widget first converts the input data into a k-nearest neighbor graph. To |
| 18 | +preserve the notions of distance, the Jaccard index for the number of shared |
| 19 | + neighbors is used to weight the edges. Finally, a |
| 20 | +`modularity optimization <https://en.wikipedia.org/wiki/Louvain_Modularity>`_ |
| 21 | + communtiy detection algorithm is applied to the graph to retrieve clusters of |
| 22 | +highly interconnected nodes. The widget outputs a new dataset in which the |
| 23 | +cluster index is used as a meta attribute. |
| 24 | + |
| 25 | +Parameters |
| 26 | +---------- |
| 27 | + |
| 28 | +- PCA processing is typically be applied to the original data to remove noise. |
| 29 | +- The distance metric is used for finding specified number of nearest |
| 30 | + neighbors. The nearest neighbors form a nearest neighbor graph. |
| 31 | +- Resolution is a parameter for the Louvain community detection algorithm that |
| 32 | + affects the size of the recovered clusters. Smaller resolutions recover |
| 33 | + smaller, and therefore a larger number of clusters, and conversely, larger |
| 34 | + values recover clusters containing more data points. |
| 35 | + |
| 36 | +References |
| 37 | +---------- |
| 38 | + |
| 39 | +Blondel, Vincent D., et al. "Fast unfolding of communities in large networks." Journal of statistical mechanics: theory and experiment 2008.10 (2008): P10008. |
| 40 | + |
| 41 | +Lambiotte, Renaud, J-C. Delvenne, and Mauricio Barahona. "Laplacian dynamics and multiscale modular structure in networks." arXiv preprint arXiv:0812.1770 (2008). |
0 commit comments