You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
144668: changefeedccl: make quantization metamorphic r=andyyang890 a=rharding6373
This PR turns changefeed.resolved_timestamp.granularity into a metamorphic test constant.
Epic: none
Fixes: #144632
Release note: None
144800: cspann: improve TestIndexConcurrency and fix issues it found r=drewkimball a=andy-kimball
Improve the TestIndexConcurrency test and fix issues it found.
#### cspann: fix buglet in searcher
Fix a small bug, where we were not setting s.levels in the case where
we're inserting into the root partition. This was causing crdb_test runs
to fail, since it reallocates the s.levels buffer every time..
#### cspann: fix data race in memstore.GetFullVectors
The race detector found a data race in MemStore's GetFullVectors that was
caused by not locking when accessing a partition's centroid. This was
deliberate, because the centroid was thought to be immutable, but the
TryClearPartition code sets the centroid to itself, which triggers the
race detector. While we could change TryClearPartition to avoid that, it's
probably best to just do the locking.
However, that causes another problem, which is that the MemStore locking
needed for getting partition centroids is different than the locking
needed to get primary key vectors. Update the GetFullVectors API to make
it clear that we can only ask for one or the other in a single call, so
that we don't need to handle the case where partition keys and primary
keys are interleaved in the same request.
#### cspann: set correct parent partition in fallbackOnTargets
If an insert fails to insert a vector into a partition that does not allow
it, it calls fallbackOnTargets to redirect to one of the target partitions.
However, the existing code does not set the parent partition correctly for
target partitions. A split of a non-root partition should set the target
partitions' parent as the parent of the splitting partition, not the
splitting partition itself.
#### cspann: add better control over search for update retries
The retry logic in searchForUpdateHelper has a couple of problems:
- In the event of an infinite retry loop bug, it can stack overflow, since
it recursively calls itself.
- Race conditions with the MemStore can cause it to not find a valid insert
partition. Reads of MemStore partitions race with background fixups that
update and delete the partitions.
This commit improves the situation by specifying the maximum number of
insert/delete attempts we'll make before giving up. The remaining attempts
are preserved when making recursive retry calls, which prevents infinite
retry loops.
#### cspann: add DeletingForSplit state to split fixup flow
Under heavy stress, a target partition can be split and deleted before
its source partition finishes its own split. This results in a situation
where the source partition is still pointing to target partitions that are
now deleted. To prevent that state, this commit adds a new DeletingForSplit
state into the split fixup flow, to be used for non-root partitions. After
vectors have been copied to target partitions, the splitting partition is
marked as DeletingForSplit. Next, the target partitions are marked Ready,
and can now safely be split or merged themselves. Finally, the splitting
partition is actually removed from the tree and deleted.
#### cspann: fix partition reload race condition
During split, we reload a partition's vectors, in case any have changed
while we created target partitions. However, it's also possible that
another worker has updated the splitting partition. In that case, we
should abort and let the racing worker take over. The existing code
was not doing that, and ended up trying to continue the split, but
with incorrect target partition keys.
#### cspann: improve logging during concurrent splits
The existing logging makes it difficult to debug bugs caused by multiple
concurrent workers racing to split a partition. This commit updates the
logging to:
- log after every key step in the split process
- only log for the worker that actually wins the race for this step
These changes reduce logging noise while still improving what does get
logged.
#### memstore: restart operation when reading a deleted partition
When the memstore comes across a deleted partition, there are two cases:
1. The txn was started before the deletion, in which case it should be
restarted, so that any search can find a different path through the tree.
2. The txn was started after the deletion, in which case ErrPartitionNotFound
should be returned.
#### memstore: fix race condition creating an empty partition
The memstore.TryCreateEmptyPartition method has a race condition, such
that two callers can end up creating different instances of the same
partition. This commit fixes that by checking whether the partition
already exists and then creating it if needed, all within the scope of
the same lock.
This change also uncovered an existing bug in splitPartition, in which
we weren't fetching metadata for the left and right sub-partitions when
restarting the split in the AddingLevel state.
#### cspann: simulate multiple index instances in TestIndexConcurrency
Create multiple index instances in the TestIndexConcurrency test, all hooked
up to the same store. This simulates multiple CRDB nodes, each independently
inserting, removing, searching, and splitting a shared index. Eliminate the
delay for one instance assisting another, in order to maximize the possibility
of race conditions.
Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Andrew Kimball <[email protected]>
0 commit comments