You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
152697: kvserver: track priority inversion in replicate queue metrics r=tbg a=wenyihu6
Part of: #151847Resolves: #152022
Release note: none
----
**kvserver: track priority inversion in replicate queue metrics**
Previously, replicas could be enqueued at a high priority but end up processing
a lower-priority actions, causing priority inversion and unfairness to other
replicas behind them that needs a repair action. This commit adds metrics to
track such cases. In addition, this commit also adds metrics to track when
replicas are requeued in the replicate queue due to a priority inversion from a
repair action to a rebalance action.
---
**kvserver: add TestPriorityInversionRequeue**
Previously, we added priority inversion requeuing mechanism.
This commit adds a unit test that forces the race condition we suspected to be
happening in escalations involving priority inversion and asserts that priority
inversion occurs and that the replica is correctly requeued. Test set up:
1. range’s leaseholder replica is rebalanced from one store to another.
2. new leaseholder enqueues the replica for repair with high priority (e.g. to
finalize the atomic replication change or remove a learner replica)
3. before processing, the old leaseholder completes the change (exits the joint
config or removes the learner).
4. when the new leaseholder processes the replica, it computes a
ConsiderRebalance action, resulting in a priority inversion and potentially
blocking other high-priority work.
---
**kvserver: delete per action priority inversion metrics**
This commit removes per-action priority inversion metrics due to their high
cardinality. We already have logging in place, which should provide sufficient
observability. For now, we care about is priority inversion that leads to
consider rebalance and requeuing the most.
Co-authored-by: wenyihu6 <[email protected]>
description: Number of priority inversions in the replicate queue that resulted in requeuing of the replicas. A priority inversion occurs when the priority at processing time ends up being lower than at enqueue time. When the priority has changed from a high priority repair action to rebalance, the change is requeued to avoid unfairness.
description: Total number of priority inversions in the replicate queue. A priority inversion occurs when the priority at processing time ends up being lower than at enqueue time
13965
+
y_axis_label: Replicas
13966
+
type: COUNTER
13967
+
unit: COUNT
13968
+
aggregation: AVG
13969
+
derivative: NON_NEGATIVE_DERIVATIVE
13954
13970
- name: queue.replicate.process.failure
13955
13971
exported_name: queue_replicate_process_failure
13956
13972
description: Number of replicas which failed processing in the replicate queue
0 commit comments