Skip to content

Clustering/Manifold/Decomposition methods in modern C++

License

Notifications You must be signed in to change notification settings

spirosmaggioros/ClusterXX

Repository files navigation

ClusterXX

logo ClusterXX is a C++ library that includes clustering, manifold and decomposition algorithms as well as the required data structures for them to be fast. Everything is implemented from scratch with armadillo being the only external library. The API follows sklearn's API so that you don't have to read all of our documentation.

Note

This library is semi-educational, meaning that, though i implemented any data structures needed to make the algorithms faster, they are not as fast as sklearn's, and i don't know if i'll ever make them as fast.

Example:

#include <clusterxx.hpp>

int main() {
    clusterxx::csv_parser parser = clusterxx::csv_parser("dataset.csv");
    auto X = parser.data();

    clusterxx::TSNE<> tsne = clusterxx::TSNE<>(
        2, /* n_components */
        30.0, /* complexity(use between 30 - 50) */
        200.0, /* learning_rate */
        12.0, /* early_exaggeration */
        1000, /* max_iter */
        1e-7, /* min_grad_norm */
        300 /* n_iter_without_progress */
    );
    auto tsne_latent_features = tsne.fit_transform(X);

    clusterxx::DBSCAN<> dbscan = clusterxx::DBSCAN<>(
        0.5, /* eps */
        5, /* num_samples */
        30, /* leaf_size */
    );
    std::vector<int> labels = dbscan.fit_predict(X);

    clusterxx::PCA pca = clusterxx::PCA(30 /* n_components */);
    auto pca_latent_features = pca.fit_transform(X);

    // We also support simple plotting
    clusterxx::Plot plot;
    plot.plot2d(dbscan);
}

You can see more examples at examples

Comparison with sklearn

t-SNE comparison on mnist

t-SNE comparison

Isomap comparison on mnist

Isomap comparison

Currently implemented methods and data structures:

  • DBSCAN
  • Isomap
  • KMeans
  • Mini Batch KMeans
  • PCA
  • t-SNE(exact method only)
  • k-d tree
  • Vantage point tree
  • Quadtree
  • Simple Graph for shortest paths(Isomap)

Installation:

First you need to install armadillo

At root directory, do:

Linux/MacOS

meson install -C build

Windows Need help here as i can't test it!

To run the unit tests, you can do:

meson test -C build

Contributions:

Contributions are open, you can contribute by solving open issues or by submitting a PR with an implementation/addition. For any information or question contact spiros at spirosmag@ieee.org

About

Clustering/Manifold/Decomposition methods in modern C++

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •