Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit dd78167

Browse files
mgaido91yanboliang
authored andcommitted
[SPARK-14516][ML] Adding ClusteringEvaluator with the implementation of Cosine silhouette and squared Euclidean silhouette.
## What changes were proposed in this pull request? This PR adds the ClusteringEvaluator Evaluator which contains two metrics: - **cosineSilhouette**: the Silhouette measure using the cosine distance; - **squaredSilhouette**: the Silhouette measure using the squared Euclidean distance. The implementation of the two metrics refers to the algorithm proposed and explained [here](https://drive.google.com/file/d/0B0Hyo%5f%5fbG%5f3fdkNvSVNYX2E3ZU0/view). These algorithms have been thought for a distributed and parallel environment, thus they have reasonable performance, unlike a naive Silhouette implementation following its definition. ## How was this patch tested? The patch has been tested with the additional unit tests added (comparing the results with the ones provided by [Python sklearn library](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)). Author: Marco Gaido <[email protected]> Closes apache#18538 from mgaido91/SPARK-14516.
1 parent e2ac2f1 commit dd78167

File tree

3 files changed

+675
-0
lines changed

3 files changed

+675
-0
lines changed

0 commit comments

Comments
 (0)