You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
152596: kvserver: requeue on priority inversion for replicate queue r=tbg a=wenyihu6
Part of: #152022
Release note: none
---
**kvserver: add PriorityInversionRequeue**
This commit adds a new cluster setting PriorityInversionRequeue that controls
whether the replicate queue should requeue replicas when their priority at
enqueue time differs significantly from their priority at processing time
(e.g. dropping from top 3 to the lowest priority).
---
**kvserver: requeue on priority inversion for replicate queue**
Previously, a replica could enter the queue with high priority but, by the time
it was processed, the action planned for this replica may have a low priority,
causing us to perform low priority work. Specifically, we are mostly worried
about cases where the priority changes from any of the repair actions to
consider rebalance. Rebalancing could take a long time and block other ranges
enqueued with actual repair action needed. This commit ensures that such
replicas are requeued instead, avoiding priority inversions.
---
**kvserver: use priorityInversionLogEveryN**
Previously, replicateQueue used V(2) to log info on priority inverted replicas
because I wanted visibility into every case without missing any replicas. On
reflection, the individual cases aren’t that interesting - it’s the overall
volume that matters, which we can track through metrics. This commit changes it
so that we just rate limit priority inversions every 3 seconds.
---
**kvserver: improve comments for PriorityInversionRequeue**
This commit improves the comments for PriorityInversionRequeue and clarifies the
contracts around action.Priority().
---
**allocator: small refactor for CheckPriorityInversion**
This commit refactors CheckPriorityInversion.
---
**allocator: add TestAllocatorPriorityInvariance**
This commit adds the TestAllocatorPriorityInvariance test, which acts as a
regression safeguard when new actions are added to AllocatorAction, ensuring the
contract is upheld. See action.Priority() and ComputeAction() for more details
on the contract.
---
**kvserver: guard inversion check and requeue behind PriorityInversionRequeue**
Previously, we introduced the PriorityInversionRequeue cluster setting, intended
for backport, to handle cases where a range was enqueued with a high-priority
repair action but, at processing time, a low-priority rebalance action was
computed. In such cases, the caller re-adds the range to the queue under its
updated priority. Although the cluster setting guards this requeue behavior, the
inversion check always ran unconditionally, reducing backport safety. This
commit updates the logic so that the cluster setting guard both the inversion
check and the requeue behavior.
---
**kvserver: move priority inversion check before applyChange**
Previously, we checked for priority inversion before planning errors, which
meant we could return requeue = true even when a planning error occurred. This
commit changes it so that planning errors should take higher precedence over a
priority inversion error. rq.processOneChange now returns early if there is a
planning error and only check for priority inversion right before applying a
change.
---
**kvserver: check for requeue before error checking in rq.process**
Previously, we checked for requeue right before returning for both nil and
non-nil errors, making the code harder to follow. This commit refactors
replicateQueue.process to requeue replicas before checking for errors.
152792: kvserver: add onProcessResult and onEnqueueResult to processCallback r=tbg a=wenyihu6
Part of: #151847
Epic: none
---
**kvserver: use non-blocking send on errors for maybeBackpressureBatch**
maybeBackpressureBatch registers a callback with the split queue for replicas
that are too large relative to their split size. This backpressures the range to
stop it from growing and prevent new writes from blocking a pending split. The
callback is invoked when the split queue finishes processing the replica.
Previously, the error channel used in the callback had a size of 1 and performed
blocking sends. This was safe because the base queue only sent a single error,
and by the time maybeBackpressureBatch returned, the callback was guaranteed to
have been consumed, and no additional sends would occur.
Future commits will allow the callback to be invoked multiple times (although it
should only twice at most). To be safe and avoid potential deadlocks from
multiple sends after maybeBackpressureBatch already returns, this commit makes
the error send non-blocking. If the channel is already full, the error is
dropped, which is acceptable since we only care about observing the completion
of the replica processing at least once.
---
**kvserver: return baseQueueAsyncRateLimited from bq.Async**
baseQueue.Async may return immediately as a noop if the semaphore does not
available capacity and the wait parameter is false. Previously, this case
returned no error, leaving the caller unaware that the request was dropped. This
commit changes the behavior to return a baseQueueAsyncRateLimited error,
allowing callers to detect and handle the condition.
---
**kvserver: add onProcessResult and onEnqueueResult to processCallback**
The base queue already supports registering callbacks that are invoked with the
processing result of replicas once they are processed. However, replicas may
fail before reaching that stage (e.g., failing to enqueue or dropped early).
This commit extends the mechanism to also report enqueue results, allowing
callers to detect failures earlier. Currently, only
decommissioningNudger.maybeEnqueueProblemRange uses this.
Note that one behavior change is introduced: previously, a registered callback
would fire only once with the processing result and not again if the replica was
later processed by the purgatory queue. With this change, the callback may now
be invoked twice.
---
**kvserver: add TestBaseQueueCallback**
This commit adds TestBaseQueueCallbackOnEnqueueResult and
TestBaseQueueCallbackOnProcessResult to verify that callbacks are correctly
invoked with both enqueue and process results.
Co-authored-by: wenyihu6 <[email protected]>
0 commit comments