You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
152787: kvserver: improve observability with decommission nudger r=tbg a=wenyihu6
Stacked on top of #152792Resolves: #151847
Epic: none
---
**kvserver: improve observability with decommission nudger**
Previously, we added the decommissioning nudger which nudges the leaseholder
replica of decommissioning ranges to enqueue themselves into the replicate queue
for decommissioning. However, we are still observing extended decommission stall
with the nudger enabled. Observability was limited, and we could not easily tell
whether replicas were successfully enqueued or processed.
This commit improves observability by adding four metrics to track the enqueue
and processing results of the decommissioning nudger:
ranges.decommissioning.nudger.{enqueue,process}.{success,failure}.
---
**kvserver: add enqueue metrics to base queue**
Previously, observability into base queue enqueuing was limited to pending queue
length and process results. This commit adds enqueue-specific metrics for the
replicate queue:
- queue.replicate.enqueue.add: counts replicas successfully added to the queue
- queue.replicate.enqueue.failedprecondition: counts replicas that failed the
replicaCanBeProcessed precondition check
- queue.replicate.enqueue.noaction: counts replicas skipped because ShouldQueue
determined no action was needed
- queue.replicate.enqueue.unexpectederror: counts replicas that were expected to
be enqueued (ShouldQueue returned true or the caller attempted a direct enqueue)
but failed due to unexpected errors
---
**kvserver: move bq.enqueueAdd update to be outside of defer**
Previously, we updated bq.enqueueAdd inside the defer statement of addInternal.
This was incorrect because we may return queued = true for a replica already
processing and was marked for requeue. That replica would later be requeued in
finishProcessingReplica, incrementing the metric again, lead to double counting.
---
**kvserver: test metrics in TestBaseQueueCallback* and TestReplicateQueueDecommissionScannerDisabled**
his commit extends TestBaseQueueCallback* and
TestReplicateQueueDecommissionScannerDisabled to also verify metric updates.
152984: sql/inspect: convert internal errors to inspect issues r=spilchen a=spilchen
Previously, internal errors during index consistency checks would fail the entire job. Now these errors are converted to structured inspect issues with detailed context.
Closes#148299
Release Notes: None
Epic: None
Co-authored-by: wenyihu6 <[email protected]>
Co-authored-by: Matt Spilchen <[email protected]>
description: Number of replicas that were expected to be enqueued (ShouldQueue returned true or the caller decided to add to the replicate queue directly), but failed to be enqueued due to unexpected errors
13933
+
y_axis_label: Replicas
13934
+
type: COUNTER
13935
+
unit: COUNT
13936
+
aggregation: AVG
13937
+
derivative: NON_NEGATIVE_DERIVATIVE
13906
13938
- name: queue.replicate.nonvoterpromotions
13907
13939
exported_name: queue_replicate_nonvoterpromotions
13908
13940
description: Number of non-voters promoted to voters by the replicate queue
0 commit comments