SuperKMeans (API)

struct SuperKMeansConfig

Class for the clustering parameters. Can be passed to the constructor of the SuperKMeans object.

uint32_t iters = 10
Number of clustering iterations

float sampling_fraction = 0.3
Fraction of points to sample. The default is 1.0 in HierarchicalSuperKMeans.

uint32_t max_points_per_cluster = 256
Maximum number of points per cluster to sample (FAISS style).
We choose the min(n_points * sampling_fraction, max_points_per_cluster * n_clusters).
Ignored if sampling_fraction = 1.0.

uint32_t seed = 42
Random seed for reproducibility

uint32_t n_threads = 0
Number of CPU threads to use (0 = max available)

bool early_termination = true
Whether to activate early termination by centroids movement and WCSS improvement.

float tol = 1e-4f
Tolerance for WCSS improvement rate and centroids shift before stopping

bool angular = false
Whether to normalize centroids after each iteration (useful for inner product clustering)

bool data_already_rotated = false
Whether the provided data already went through a random orthogonal rotation.

bool use_blas_only = false
Disables pruning. Performance is then equivalent to FAISS' clustering.

bool verbose = false

class SuperKMeans

SuperKMeans clustering that interleaves GEMM routines and pruning kernels to accelerate training.

Public Functions

SuperKMeans(size_t n_clusters, size_t dimensionality)

SuperKMeans(size_t n_clusters, size_t dimensionality, const SuperKMeansConfig& config)

Train(const float * data, const size_t n_points)

Run SuperKMeans training
Parameters:
• data: Pointer to training vectors. Size: n_points * dimensionality
• n_points: Number of training vectors
Returns:
• std::vector<float> centroids: Trained centroids. Size: n_clusters * dimensionality

Assign(const float * vectors, const float * centroids, const size_t n_vectors, const size_t n_centroids)

Assign vectors to their nearest centroid using brute force search (full GEMM).
Parameters:
• vectors: Pointer to vectors to assign. Size: n_vectors * dimensionality
• centroids: Pointer to centroids. Size: n_centroids * dimensionality
• n_points: Number of training vectors
• n_centroids: Number of centroids
Returns:
• std::vector<uint32_t> assignments: Assignments to provided vectors. Size: n_points

FastAssign(const float * vectors, const float * centroids, const size_t n_vectors, const size_t n_centroids)

Fast assignment using GEMM+PRUNING with trained state.
Assumes that the vectors sent here are the same as those used in .Train().
Leverages the assignments from the training for a faster.
Parameters:
• vectors: Pointer to vectors to assign. Size: n_vectors * dimensionality
• centroids: Pointer to centroids. Size: n_centroids * dimensionality
• n_points: Number of training vectors
• n_centroids: Number of centroids
Returns:
• std::vector<uint32_t> assignments: Assignments to provided vectors. Size: n_points

Home

Quickstart

Usage example in C++
Usage example in Python

C++ API Documentation

SuperKMeans
Hierarchical SuperKMeans

Comparisons

(coming soon)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SuperKMeans (API)

struct `SuperKMeansConfig`

class `SuperKMeans`

Public Functions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Quickstart

C++ API Documentation

Comparisons

Clone this wiki locally

SuperKMeans (API)

struct SuperKMeansConfig

class SuperKMeans

Public Functions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Quickstart

C++ API Documentation

Comparisons

Clone this wiki locally

struct `SuperKMeansConfig`

class `SuperKMeans`