Skip to content

Commit fc4a855

Browse files
craig[bot]wenyihu6
andcommitted
Merge #144518
144518: kvserver/closedts: add metrics for policy refresher r=arulajmani a=wenyihu6 **kvserver: add kv.closed_timestamp.policy_change** Previously, it was difficult to measure how often policies changed for ranges, which is important because such changes can trigger additional range updates sent in side transport. This commit adds a metric to track the number of policy changes on replicas. Part of: #143890 Release note: none --- **kvserver: add more metrics for policies** Previously, it was difficult to determine how many ranges fell into each latency bucket policy. This commit adds 18 new metrics to StoreMetrics to track the number of ranges per policy bucket for every store. Part of: #143890 Release note: none --- **kvserver: add kv.closed_timestamp.policy_latency_info_missing** When a replica refreshes its policies, it looks up its peer replicas latency info via a map passed by PolicyRefresher, which in turn periodically pulls node latency info from RPCContext. If latency data for a node is missing, a default hardcoded max RTT of 150ms is used. Previously, it was hard to tell when this is happening. This commit adds metrics to track how often the closed timestamp policy refresh falls back to the default RTT due to missing node latency info. A high count might indicate the latency cache isn’t refreshed frequently enough, suggesting we should consider lowering kv.closed_timestamp.policy_latency_refresh_interval. Resolves: #143890 Release note: none Co-authored-by: wenyihu6 <[email protected]>
2 parents 1e6007d + 2375f70 commit fc4a855

File tree

5 files changed

+107
-24
lines changed

5 files changed

+107
-24
lines changed

docs/generated/metrics/metrics.html

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,26 @@
193193
<tr><td>STORAGE</td><td>kv.allocator.load_based_replica_rebalancing.missing_stats_for_existing_store</td><td>The number times the allocator was missing the qps stats for the existing store</td><td>Attempts</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
194194
<tr><td>STORAGE</td><td>kv.allocator.load_based_replica_rebalancing.should_transfer</td><td>The number times the allocator determined that the replica should be rebalanced to another store for better load distribution</td><td>Attempts</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
195195
<tr><td>STORAGE</td><td>kv.closed_timestamp.max_behind_nanos</td><td>Largest latency between realtime and replica max closed timestamp</td><td>Nanoseconds</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
196+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lag_by_cluster_setting</td><td>Number of ranges with LAG_BY_CLUSTER_SETTING closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
197+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_equal_or_greater_than_300ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_EQUAL_OR_GREATER_THAN_300MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
198+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_100ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_100MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
199+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_120ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_120MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
200+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_140ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_140MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
201+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_160ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_160MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
202+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_180ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_180MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
203+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_200ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_200MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
204+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_20ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_20MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
205+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_220ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_220MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
206+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_240ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_240MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
207+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_260ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_260MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
208+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_280ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_280MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
209+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_300ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_300MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
210+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_40ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_40MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
211+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_60ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_60MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
212+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_80ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_80MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
213+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_with_no_latency_info</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_WITH_NO_LATENCY_INFO closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
214+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy_change</td><td>Number of times closed timestamp policy change occurred on ranges</td><td>Events</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
215+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy_latency_info_missing</td><td>Number of times closed timestamp policy refresh had to use hardcoded network RTT due to missing node latency info for one or more replicas</td><td>Events</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
196216
<tr><td>STORAGE</td><td>kv.concurrency.avg_lock_hold_duration_nanos</td><td>Average lock hold duration across locks currently held in lock tables. Does not include replicated locks (intents) that are not held in memory</td><td>Nanoseconds</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
197217
<tr><td>STORAGE</td><td>kv.concurrency.avg_lock_wait_duration_nanos</td><td>Average lock wait duration across requests currently waiting in lock wait-queues</td><td>Nanoseconds</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
198218
<tr><td>STORAGE</td><td>kv.concurrency.latch_conflict_wait_durations</td><td>Durations in nanoseconds spent on latch acquisition waiting for conflicts with other latches</td><td>Nanoseconds</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>

pkg/kv/kvserver/metrics.go

Lines changed: 51 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,13 @@ package kvserver
88
import (
99
"context"
1010
"fmt"
11+
"strings"
1112
"sync/atomic"
1213
"time"
1314

1415
"github.com/cockroachdb/cockroach/pkg/kv/kvbase"
1516
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/result"
17+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/closedts/ctpb"
1618
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb"
1719
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed"
1820
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/split"
@@ -2432,6 +2434,22 @@ throttled they do count towards 'delay.total' and 'delay.enginebackpressure'.
24322434
Unit: metric.Unit_NANOSECONDS,
24332435
}
24342436

2437+
// Closed timestamp policy change metrics.
2438+
metaClosedTimestampPolicyChange = metric.Metadata{
2439+
Name: "kv.closed_timestamp.policy_change",
2440+
Help: "Number of times closed timestamp policy change occurred on ranges",
2441+
Measurement: "Events",
2442+
Unit: metric.Unit_COUNT,
2443+
}
2444+
2445+
metaClosedTimestampLatencyInfoMissing = metric.Metadata{
2446+
Name: "kv.closed_timestamp.policy_latency_info_missing",
2447+
Help: "Number of times closed timestamp policy refresh had to use hardcoded network RTT " +
2448+
"due to missing node latency info for one or more replicas",
2449+
Measurement: "Events",
2450+
Unit: metric.Unit_COUNT,
2451+
}
2452+
24352453
// Replica circuit breaker.
24362454
metaReplicaCircuitBreakerCurTripped = metric.Metadata{
24372455
Name: "kv.replica_circuit_breaker.num_tripped_replicas",
@@ -2664,11 +2682,12 @@ type StoreMetrics struct {
26642682
RaftFlowStateCounts [tracker.StateCount]*metric.Gauge
26652683

26662684
// Range metrics.
2667-
RangeCount *metric.Gauge
2668-
UnavailableRangeCount *metric.Gauge
2669-
UnderReplicatedRangeCount *metric.Gauge
2670-
OverReplicatedRangeCount *metric.Gauge
2671-
DecommissioningRangeCount *metric.Gauge
2685+
RangeCount *metric.Gauge
2686+
UnavailableRangeCount *metric.Gauge
2687+
UnderReplicatedRangeCount *metric.Gauge
2688+
OverReplicatedRangeCount *metric.Gauge
2689+
DecommissioningRangeCount *metric.Gauge
2690+
RangeClosedTimestampPolicyCount [ctpb.MAX_CLOSED_TIMESTAMP_POLICY]*metric.Gauge
26722691

26732692
// Lease request metrics for successful and failed lease requests. These
26742693
// count proposals (i.e. it does not matter how many replicas apply the
@@ -3033,6 +3052,10 @@ type StoreMetrics struct {
30333052
// Closed timestamp metrics.
30343053
ClosedTimestampMaxBehindNanos *metric.Gauge
30353054

3055+
// Closed timestamp policy change on ranges metrics.
3056+
ClosedTimestampPolicyChange *metric.Counter
3057+
ClosedTimestampLatencyInfoMissing *metric.Counter
3058+
30363059
// Replica circuit breaker.
30373060
ReplicaCircuitBreakerCurTripped *metric.Gauge
30383061
ReplicaCircuitBreakerCumTripped *metric.Counter
@@ -3374,11 +3397,12 @@ func newStoreMetrics(histogramWindow time.Duration) *StoreMetrics {
33743397
RaftFlowStateCounts: raftFlowStateGaugeSlice(),
33753398

33763399
// Range metrics.
3377-
RangeCount: metric.NewGauge(metaRangeCount),
3378-
UnavailableRangeCount: metric.NewGauge(metaUnavailableRangeCount),
3379-
UnderReplicatedRangeCount: metric.NewGauge(metaUnderReplicatedRangeCount),
3380-
OverReplicatedRangeCount: metric.NewGauge(metaOverReplicatedRangeCount),
3381-
DecommissioningRangeCount: metric.NewGauge(metaDecommissioningRangeCount),
3400+
RangeCount: metric.NewGauge(metaRangeCount),
3401+
UnavailableRangeCount: metric.NewGauge(metaUnavailableRangeCount),
3402+
UnderReplicatedRangeCount: metric.NewGauge(metaUnderReplicatedRangeCount),
3403+
OverReplicatedRangeCount: metric.NewGauge(metaOverReplicatedRangeCount),
3404+
DecommissioningRangeCount: metric.NewGauge(metaDecommissioningRangeCount),
3405+
RangeClosedTimestampPolicyCount: makePolicyRefresherMetrics(),
33823406

33833407
// Lease request metrics.
33843408
LeaseRequestSuccessCount: metric.NewCounter(metaLeaseRequestSuccessCount),
@@ -3849,6 +3873,9 @@ func newStoreMetrics(histogramWindow time.Duration) *StoreMetrics {
38493873
// Estimated MVCC stats in split.
38503874
SplitsWithEstimatedStats: metric.NewCounter(metaSplitEstimatedStats),
38513875
SplitEstimatedTotalBytesDiff: metric.NewCounter(metaSplitEstimatedTotalBytesDiff),
3876+
3877+
ClosedTimestampPolicyChange: metric.NewCounter(metaClosedTimestampPolicyChange),
3878+
ClosedTimestampLatencyInfoMissing: metric.NewCounter(metaClosedTimestampLatencyInfoMissing),
38523879
}
38533880
sm.categoryIterMetrics.init(storeRegistry)
38543881

@@ -4155,6 +4182,20 @@ func raftFlowStateGaugeSlice() [tracker.StateCount]*metric.Gauge {
41554182
return gauges
41564183
}
41574184

4185+
func makePolicyRefresherMetrics() [ctpb.MAX_CLOSED_TIMESTAMP_POLICY]*metric.Gauge {
4186+
var policyGauges [ctpb.MAX_CLOSED_TIMESTAMP_POLICY]*metric.Gauge
4187+
for policy := ctpb.LAG_BY_CLUSTER_SETTING; policy < ctpb.MAX_CLOSED_TIMESTAMP_POLICY; policy++ {
4188+
meta := metric.Metadata{
4189+
Name: fmt.Sprintf("kv.closed_timestamp.policy.%s", strings.ToLower(policy.String())),
4190+
Help: fmt.Sprintf("Number of ranges with %s closed timestamp policy", policy.String()),
4191+
Measurement: "Ranges",
4192+
Unit: metric.Unit_COUNT,
4193+
}
4194+
policyGauges[policy] = metric.NewGauge(meta)
4195+
}
4196+
return policyGauges
4197+
}
4198+
41584199
func storageLevelMetricMetadata(
41594200
name, helpTpl, measurement string, unit metric.Unit,
41604201
) [7]metric.Metadata {

pkg/kv/kvserver/replica.go

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1405,7 +1405,7 @@ func (r *Replica) closedTimestampPolicyRLocked() ctpb.RangeClosedTimestampPolicy
14051405
// RefreshPolicy updates the replica's cached closed timestamp policy based on
14061406
// span configurations and provided node round-trip latencies.
14071407
func (r *Replica) RefreshPolicy(latencies map[roachpb.NodeID]time.Duration) {
1408-
policy := func() ctpb.RangeClosedTimestampPolicy {
1408+
computeNewPolicy := func(oldPolicy ctpb.RangeClosedTimestampPolicy) ctpb.RangeClosedTimestampPolicy {
14091409
desc, conf := r.DescAndSpanConfig()
14101410
// The node liveness range ignores zone configs and always uses a
14111411
// LAG_BY_CLUSTER_SETTING closed timestamp policy. If it was to begin
@@ -1430,20 +1430,31 @@ func (r *Replica) RefreshPolicy(latencies map[roachpb.NodeID]time.Duration) {
14301430
// policy bucket. This then controls how far in the future timestamps will
14311431
// be closed for the range.
14321432
maxLatency := time.Duration(-1)
1433+
replicaLatencyInfoMissing := false
14331434
for _, peer := range desc.InternalReplicas {
14341435
peerLatency := closedts.DefaultMaxNetworkRTT
14351436
if latency, ok := latencies[peer.NodeID]; ok {
14361437
peerLatency = latency
1438+
} else {
1439+
replicaLatencyInfoMissing = true
14371440
}
14381441
maxLatency = max(maxLatency, peerLatency)
14391442
}
1443+
if replicaLatencyInfoMissing {
1444+
r.store.metrics.ClosedTimestampLatencyInfoMissing.Inc(1)
1445+
}
14401446
return closedts.FindBucketBasedOnNetworkRTTWithDampening(
1441-
ctpb.RangeClosedTimestampPolicy(r.cachedClosedTimestampPolicy.Load()),
1447+
oldPolicy,
14421448
maxLatency,
14431449
closedts.PolicySwitchWhenLatencyExceedsBucketFraction.Get(&r.store.GetStoreConfig().Settings.SV),
14441450
)
14451451
}
1446-
r.cachedClosedTimestampPolicy.Store(int32(policy()))
1452+
oldPolicy := ctpb.RangeClosedTimestampPolicy(r.cachedClosedTimestampPolicy.Load())
1453+
newPolicy := computeNewPolicy(oldPolicy)
1454+
if newPolicy != oldPolicy {
1455+
r.store.metrics.ClosedTimestampPolicyChange.Inc(1)
1456+
r.cachedClosedTimestampPolicy.Store(int32(newPolicy))
1457+
}
14471458
}
14481459

14491460
// NodeID returns the ID of the node this replica belongs to.

pkg/kv/kvserver/replica_metrics.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ import (
1212
"github.com/cockroachdb/cockroach/pkg/base"
1313
"github.com/cockroachdb/cockroach/pkg/keys"
1414
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/allocatorimpl"
15+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/closedts/ctpb"
1516
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency"
1617
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb"
1718
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/livenesspb"
@@ -62,6 +63,7 @@ type ReplicaMetrics struct {
6263
PendingRaftProposalCount int64
6364
SlowRaftProposalCount int64
6465
RaftFlowStateCounts [tracker.StateCount]int64
66+
ClosedTimestampPolicy ctpb.RangeClosedTimestampPolicy
6567

6668
QuotaPoolPercentUsed int64 // [0,100]
6769

@@ -123,6 +125,7 @@ func (r *Replica) Metrics(
123125
paused: r.mu.pausedFollowers,
124126
pendingRaftProposalCount: r.numPendingProposalsRLocked(),
125127
slowRaftProposalCount: r.mu.slowProposalCount,
128+
closedTimestampPolicy: ctpb.RangeClosedTimestampPolicy(r.cachedClosedTimestampPolicy.Load()),
126129
}
127130

128131
r.mu.RUnlock()
@@ -154,6 +157,7 @@ type calcReplicaMetricsInput struct {
154157
paused map[roachpb.ReplicaID]struct{}
155158
pendingRaftProposalCount int64
156159
slowRaftProposalCount int64
160+
closedTimestampPolicy ctpb.RangeClosedTimestampPolicy
157161
}
158162

159163
func calcReplicaMetrics(d calcReplicaMetricsInput) ReplicaMetrics {
@@ -226,6 +230,7 @@ func calcReplicaMetrics(d calcReplicaMetricsInput) ReplicaMetrics {
226230
QuotaPoolPercentUsed: calcQuotaPoolPercentUsed(d.qpUsed, d.qpCapacity),
227231
LatchMetrics: d.latchMetrics,
228232
LockTableMetrics: d.lockTableMetrics,
233+
ClosedTimestampPolicy: d.closedTimestampPolicy,
229234
}
230235
}
231236

pkg/kv/kvserver/store.go

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ import (
3333
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/load"
3434
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool"
3535
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval"
36+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/closedts/ctpb"
3637
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/closedts/policyrefresher"
3738
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/closedts/sidetransport"
3839
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/idalloc"
@@ -3360,17 +3361,18 @@ func (s *Store) updateReplicationGauges(ctx context.Context) error {
33603361
totalRaftLogSize int64
33613362
maxRaftLogSize int64
33623363

3363-
rangeCount int64
3364-
unavailableRangeCount int64
3365-
underreplicatedRangeCount int64
3366-
overreplicatedRangeCount int64
3367-
decommissioningRangeCount int64
3368-
behindCount int64
3369-
pausedFollowerCount int64
3370-
ioOverload float64
3371-
pendingRaftProposalCount int64
3372-
slowRaftProposalCount int64
3373-
raftFlowStateCounts [tracker.StateCount]int64
3364+
rangeCount int64
3365+
unavailableRangeCount int64
3366+
underreplicatedRangeCount int64
3367+
overreplicatedRangeCount int64
3368+
decommissioningRangeCount int64
3369+
behindCount int64
3370+
pausedFollowerCount int64
3371+
ioOverload float64
3372+
pendingRaftProposalCount int64
3373+
slowRaftProposalCount int64
3374+
raftFlowStateCounts [tracker.StateCount]int64
3375+
closedTimestampPolicyCounts [ctpb.MAX_CLOSED_TIMESTAMP_POLICY]int64
33743376

33753377
locks int64
33763378
totalLockHoldDurationNanos int64
@@ -3429,6 +3431,7 @@ func (s *Store) updateReplicationGauges(ctx context.Context) error {
34293431
}
34303432
if metrics.Leaseholder {
34313433
s.metrics.RaftQuotaPoolPercentUsed.RecordValue(metrics.QuotaPoolPercentUsed)
3434+
closedTimestampPolicyCounts[metrics.ClosedTimestampPolicy] += 1
34323435
leaseHolderCount++
34333436
switch metrics.LeaseType {
34343437
case roachpb.LeaseNone:
@@ -3531,6 +3534,9 @@ func (s *Store) updateReplicationGauges(ctx context.Context) error {
35313534
for state, cnt := range raftFlowStateCounts {
35323535
s.metrics.RaftFlowStateCounts[state].Update(cnt)
35333536
}
3537+
for policy, count := range closedTimestampPolicyCounts {
3538+
s.metrics.RangeClosedTimestampPolicyCount[policy].Update(count)
3539+
}
35343540
s.metrics.RaftLogTotalSize.Update(totalRaftLogSize)
35353541
s.metrics.RaftLogMaxSize.Update(maxRaftLogSize)
35363542
s.metrics.AverageQueriesPerSecond.Update(averageQueriesPerSecond)

0 commit comments

Comments
 (0)