Conversation
|
@janezd and @markotoplak are you ok with the proposed scheme? I will push the changes for tests and widgets after we agree with the clustering scheme. |
|
Conceptually, some of this doesn't make sense. This is what I would expect to happen when using clustering classes: I call DBScan with some data and I get a model. I would assume the cluster labels are available at this point (via some In k-means, this makes complete sense, but dbscan and Louvain don't have a transform method, nor is it straightforward to add one. I propose that the cluster assignments of the training data be available on the model as a field (so I can get cluster assignments by just calling fit) and if the method supports |
|
@pavlin-policar I agree with your solution. I also do not like that we call clustering twice for the same results as well that we pretend that there is a transform function. @janezd what you think about Pavlins idea. |
|
Tests are failing since the widget for Lovain clustering is not fixed to new clusterings yet. I would like to hear the opinion about @pavlin-policar idea anyway. |
|
Main points of some additional discussion:
|
febfdb9 to
055d273
Compare
|
@lanzagar can you check whether the clustering implementation is what we want. I meanwhile I will fix lint and also add some more tests for clustering. |
Codecov Report
@@ Coverage Diff @@
## master #3814 +/- ##
==========================================
+ Coverage 84.29% 84.31% +0.01%
==========================================
Files 384 385 +1
Lines 72749 72758 +9
==========================================
+ Hits 61327 61343 +16
+ Misses 11422 11415 -7 |
c7e9c4c to
e9bcf1e
Compare
e73e590 to
86295cb
Compare
|
@lanzagar I think it is done now. Can you check if there is anything to correct? |
Issue
Fixes #3805
Description of changes
Clustering algorithms are simplified and unified. Each clustering algorithm (kMeans, DBSCAN, Lovain) inherit from a common
Clusteringclass. Clustering itself is stateless (it does not remember the data or parameters (such as centroids)). After it is called it returns clusters or whenfit_storageis called it creates a new instance of typeClusteringModelwhich holds the model parameters (in case of k-means they are clusters) - model can be created just for k-Means. Other algorithms do not have parameters to store or do not enable predicting.The interface to call the clustering is now different. Here is the example for kMeans:
or in the case when we need model:
I also removed the silhouette computing from k-means class since I think it is not the part of the clustering but its own unit. From now on it will be called directly from the widget.
Includes