ClusterXX is a C++ library that includes clustering, manifold and decomposition algorithms as well as the required data structures for them to be fast. Everything is implemented from scratch with armadillo being the only external library. The API follows sklearn's API so that you don't have to read all of our documentation.
Note
This library is semi-educational, meaning that, though i implemented any data structures needed to make the algorithms faster, they are not as fast as sklearn's, and i don't know if i'll ever make them as fast.
#include <clusterxx.hpp>
int main() {
clusterxx::csv_parser parser = clusterxx::csv_parser("dataset.csv");
auto X = parser.data();
clusterxx::TSNE<> tsne = clusterxx::TSNE<>(
2, /* n_components */
30.0, /* complexity(use between 30 - 50) */
200.0, /* learning_rate */
12.0, /* early_exaggeration */
1000, /* max_iter */
1e-7, /* min_grad_norm */
300 /* n_iter_without_progress */
);
auto tsne_latent_features = tsne.fit_transform(X);
clusterxx::DBSCAN<> dbscan = clusterxx::DBSCAN<>(
0.5, /* eps */
5, /* num_samples */
30, /* leaf_size */
);
std::vector<int> labels = dbscan.fit_predict(X);
clusterxx::PCA pca = clusterxx::PCA(30 /* n_components */);
auto pca_latent_features = pca.fit_transform(X);
// We also support simple plotting
clusterxx::Plot plot;
plot.plot2d(dbscan);
}You can see more examples at examples
- DBSCAN
- Isomap
- KMeans
- Mini Batch KMeans
- PCA
- t-SNE(exact method only)
- k-d tree
- Vantage point tree
- Quadtree
- Simple Graph for shortest paths(Isomap)
First you need to install armadillo
At root directory, do:
Linux/MacOS
meson install -C buildWindows Need help here as i can't test it!
To run the unit tests, you can do:
meson test -C buildContributions are open, you can contribute by solving open issues or by submitting a PR with an implementation/addition. For any information or question contact spiros at spirosmag@ieee.org

