You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cspann: convert search logic to use pull iteration
Using non-transactional splits breaks certain assumptions that the C-SPANN
library relies upon:
* The K-means tree must be fully balanced - interior partitions can now be
empty.
* Partition child keys are never duplicated - splitting can duplicates
child partition keys, which can persist in the tree.
* Partition child keys always reference existing partitions - partition
child keys can now reference missing partitions.
* Inserts will always trivially find an insertion partition - now, it's
possible to get "blocked" trying to find a path to a target partition
that supports inserts. The blockage could be in the form of an empty
interior partition, a dangling child partition key, or a target
partition that does not allow inserts.
To address these issues, this commit converts the existing search logic to
use "pull" iteration. Code that needs to search the tree can iteratively
get the next batch of results, and the next after that, and so on, without
knowing up front exactly how many results are needed. Each batch is sorted
by distance, with duplicates removed. However, batches are not strictly
ordered in relation to one another and duplicates can exist across batches
(though each subsequent batch does tend to have greater distances).
Pull iteration largely solves the issues noted above. For example, the
insert operation can pull one result at a time. If that partition does not
support inserts, it can pull the next. If a batch is empty due to hitting
a "dead end" in tree traversal, it can just pull the next batch.
Epic: CRDB-42943
Release note: None
0 commit comments