Skip to content

SuperKMeans (API)

Leonardo Xavier Kuffo Rivero edited this page Mar 25, 2026 · 3 revisions

Implemented in superkmeans.h

struct SuperKMeansConfig

Class for the clustering parameters. Can be passed to the constructor of the SuperKMeans object.

uint32_t iters = 10
  Number of clustering iterations

float sampling_fraction = 0.3
  Fraction of points to sample. The default is 1.0 in HierarchicalSuperKMeans.

uint32_t max_points_per_cluster = 256
  Maximum number of points per cluster to sample (FAISS style).
  We choose the min(n_points * sampling_fraction, max_points_per_cluster * n_clusters).
  Ignored if sampling_fraction = 1.0.

uint32_t seed = 42
  Random seed for reproducibility

uint32_t n_threads = 0
  Number of CPU threads to use (0 = max available)

bool early_termination = true
  Whether to activate early termination by centroids movement and WCSS improvement.

float tol = 1e-4f
  Tolerance for WCSS improvement rate and centroids shift before stopping

bool angular = false
  Whether to normalize centroids after each iteration (useful for inner product clustering)

bool data_already_rotated = false
  Whether the provided data already went through a random orthogonal rotation.

bool use_blas_only = false
  Disables pruning. Performance is then equivalent to FAISS' clustering.

bool verbose = false


class SuperKMeans

SuperKMeans clustering that interleaves GEMM routines and pruning kernels to accelerate training.

Public Functions

SuperKMeans(size_t n_clusters, size_t dimensionality)
SuperKMeans(size_t n_clusters, size_t dimensionality, const SuperKMeansConfig& config)
Train(const float * data, const size_t n_points)

  Run SuperKMeans training
  Parameters:
    • data: Pointer to training vectors. Size: n_points * dimensionality
    • n_points: Number of training vectors
  Returns:
    • std::vector<float> centroids: Trained centroids. Size: n_clusters * dimensionality

Assign(const float * vectors, const float * centroids, const size_t n_vectors, const size_t n_centroids)

  Assign vectors to their nearest centroid using brute force search (full GEMM).
  Parameters:
    • vectors: Pointer to vectors to assign. Size: n_vectors * dimensionality
    • centroids: Pointer to centroids. Size: n_centroids * dimensionality
    • n_points: Number of training vectors
    • n_centroids: Number of centroids
  Returns:
    • std::vector<uint32_t> assignments: Assignments to provided vectors. Size: n_points

FastAssign(const float * vectors, const float * centroids, const size_t n_vectors, const size_t n_centroids)

  Fast assignment using GEMM+PRUNING with trained state.
  Assumes that the vectors sent here are the same as those used in .Train().
  Leverages the assignments from the training for a faster.
  Parameters:
    • vectors: Pointer to vectors to assign. Size: n_vectors * dimensionality
    • centroids: Pointer to centroids. Size: n_centroids * dimensionality
    • n_points: Number of training vectors
    • n_centroids: Number of centroids
  Returns:
    • std::vector<uint32_t> assignments: Assignments to provided vectors. Size: n_points

Home

Quickstart

Usage example in C++
Usage example in Python

C++ API Documentation

SuperKMeans
Hierarchical SuperKMeans

Comparisons

(coming soon)

Clone this wiki locally