Home

Super K-Means

SuperKMeans is a super-fast library for clustering high-dimensional vector embeddings. The main use case is to create partition-based indexes in large vector collections (e.g., IVF).

We offer two clustering options:

Vanilla Super K-Means: Classical Lloyd's k-means accelerated with dimension pruning. Clustering is of equivalent quality to FAISS.
Hierarchical Super K-Means: Hierarchical k-means accelerated with dimension pruning. Extremely fast for large collections (+1M embeddings) with minimal quality loss.

Use Super K-Means if:

You need to index a large collection of high-dimensional (d > 128) vector embeddings
You need a lightweight and much faster alternative to FAISS clustering

How to Install

Check INSTALL.md.

Documentation (C++ API)

Home

Quickstart

Usage example in C++
Usage example in Python

C++ API Documentation

SuperKMeans
Hierarchical SuperKMeans

Comparisons

(coming soon)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Super K-Means

Use Super K-Means if:

How to Install

Documentation (C++ API)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Quickstart

C++ API Documentation

Comparisons

Clone this wiki locally