You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a replica refreshes its policies, it looks up its peer replicas latency
info via a map passed by PolicyRefresher, which in turn periodically pulls node
latency info from RPCContext. If latency data for a node is missing, a default
hardcoded max RTT of 150ms is used.
Previously, it was hard to tell when this is happening. This commit adds metrics
to track how often the closed timestamp policy refresh falls back to the default
RTT due to missing node latency info. A high count might indicate the latency
cache isn’t refreshed frequently enough, suggesting we should consider lowering
kv.closed_timestamp.policy_latency_refresh_interval.
Resolves: #143890
Release note: none
Copy file name to clipboardExpand all lines: docs/generated/metrics/metrics.html
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -212,6 +212,7 @@
212
212
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_latency_less_than_80ms</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_LATENCY_LESS_THAN_80MS closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
213
213
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy.lead_for_global_reads_with_no_latency_info</td><td>Number of ranges with LEAD_FOR_GLOBAL_READS_WITH_NO_LATENCY_INFO closed timestamp policy</td><td>Ranges</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
214
214
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy_change</td><td>Number of times closed timestamp policy change occurred on ranges</td><td>Events</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
215
+
<tr><td>STORAGE</td><td>kv.closed_timestamp.policy_latency_info_missing</td><td>Number of times closed timestamp policy refresh had to use hardcoded network RTT due to missing node latency info for one or more replicas</td><td>Events</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
215
216
<tr><td>STORAGE</td><td>kv.concurrency.avg_lock_hold_duration_nanos</td><td>Average lock hold duration across locks currently held in lock tables. Does not include replicated locks (intents) that are not held in memory</td><td>Nanoseconds</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
216
217
<tr><td>STORAGE</td><td>kv.concurrency.avg_lock_wait_duration_nanos</td><td>Average lock wait duration across requests currently waiting in lock wait-queues</td><td>Nanoseconds</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
217
218
<tr><td>STORAGE</td><td>kv.concurrency.latch_conflict_wait_durations</td><td>Durations in nanoseconds spent on latch acquisition waiting for conflicts with other latches</td><td>Nanoseconds</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
0 commit comments