-
Notifications
You must be signed in to change notification settings - Fork 4
SuperKMeans (API)
Implemented in superkmeans.h
Class for the clustering parameters. Can be passed to the constructor of the SuperKMeans object.
uint32_t iters = 10
Number of clustering iterations
float sampling_fraction = 0.3
Fraction of points to sample. The default is 1.0 in HierarchicalSuperKMeans.
uint32_t max_points_per_cluster = 256
Maximum number of points per cluster to sample (FAISS style).
We choose the min(n_points * sampling_fraction, max_points_per_cluster * n_clusters).
Ignored if sampling_fraction = 1.0.
uint32_t seed = 42
Random seed for reproducibility
uint32_t n_threads = 0
Number of CPU threads to use (0 = max available)
bool early_termination = true
Whether to activate early termination by centroids movement and WCSS improvement.
float tol = 1e-4f
Tolerance for WCSS improvement rate and centroids shift before stopping
bool angular = false
Whether to normalize centroids after each iteration (useful for inner product clustering)
bool data_already_rotated = false
Whether the provided data already went through a random orthogonal rotation.
bool use_blas_only = false
Disables pruning. Performance is then equivalent to FAISS' clustering.
bool verbose = false
SuperKMeans clustering that interleaves GEMM routines and pruning kernels to accelerate training.
SuperKMeans(size_t n_clusters, size_t dimensionality)SuperKMeans(size_t n_clusters, size_t dimensionality, const SuperKMeansConfig& config)Train(const float * data, const size_t n_points) Run SuperKMeans training
Parameters:
• data: Pointer to training vectors. Size: n_points * dimensionality
• n_points: Number of training vectors
Returns:
• std::vector<float> centroids: Trained centroids. Size: n_clusters * dimensionality
Assign(const float * vectors, const float * centroids, const size_t n_vectors, const size_t n_centroids) Assign vectors to their nearest centroid using brute force search (full GEMM).
Parameters:
• vectors: Pointer to vectors to assign. Size: n_vectors * dimensionality
• centroids: Pointer to centroids. Size: n_centroids * dimensionality
• n_points: Number of training vectors
• n_centroids: Number of centroids
Returns:
• std::vector<uint32_t> assignments: Assignments to provided vectors. Size: n_points
FastAssign(const float * vectors, const float * centroids, const size_t n_vectors, const size_t n_centroids) Fast assignment using GEMM+PRUNING with trained state.
Assumes that the vectors sent here are the same as those used in .Train().
Leverages the assignments from the training for a faster.
Parameters:
• vectors: Pointer to vectors to assign. Size: n_vectors * dimensionality
• centroids: Pointer to centroids. Size: n_centroids * dimensionality
• n_points: Number of training vectors
• n_centroids: Number of centroids
Returns:
• std::vector<uint32_t> assignments: Assignments to provided vectors. Size: n_points