You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-14516][ML] Adding ClusteringEvaluator with the implementation of Cosine silhouette and squared Euclidean silhouette.
## What changes were proposed in this pull request?
This PR adds the ClusteringEvaluator Evaluator which contains two metrics:
- **cosineSilhouette**: the Silhouette measure using the cosine distance;
- **squaredSilhouette**: the Silhouette measure using the squared Euclidean distance.
The implementation of the two metrics refers to the algorithm proposed and explained [here](https://drive.google.com/file/d/0B0Hyo%5f%5fbG%5f3fdkNvSVNYX2E3ZU0/view). These algorithms have been thought for a distributed and parallel environment, thus they have reasonable performance, unlike a naive Silhouette implementation following its definition.
## How was this patch tested?
The patch has been tested with the additional unit tests added (comparing the results with the ones provided by [Python sklearn library](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)).
Author: Marco Gaido <[email protected]>
Closesapache#18538 from mgaido91/SPARK-14516.
0 commit comments