You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
141601: metric: Add functionality to disable aggregate metric reporting r=alyshanjahani-crl a=alyshanjahani-crl
Previously when server.child_metrics.enabled was true, the prometheus exporter
would include the time series with the child labels as well as the aggregate
time series. This behaviour is not ideal as it can lead to double counting.
For example, consider the following metric:
counter_foo{node_id="1"} 6
counter_foo{node_id="1", child_metric_label="bar"} 3
counter_foo{node_id="1", child_metric_label="xyz"} 3
The aggregate metric is always the sum of all the child metrics. If a user
were to export these metrics and query them like so:
sum(counter_foo) by node_id
They would get back 6+3+3=12 when really they should be getting back 6.
This commit introduces a cluster setting
server.child_metrics.include_aggregate.enabled which is false by default.
Fixes: https://cockroachlabs.atlassian.net/browse/CC-31395
Release note (ops change): Modifies the default behaviour of prometheus metric
reporting (/_status/vars) by not including the aggregate time series. This
prevents issues with double counting when querying metrics. Note that this
reporting behaviour can be toggled by a new cluster setting
server.child_metrics.include_aggregate.enabled. By default it is false, but
by setting it to true, the (undesired) behaviour of reporting the aggregate
timeseries is restored.
Co-authored-by: Alyshan Jahani <[email protected]>
Copy file name to clipboardExpand all lines: docs/generated/settings/settings-for-tenants.txt
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -111,6 +111,7 @@ server.auth_log.sql_connections.enabled boolean false if set, log SQL client con
111
111
server.auth_log.sql_sessions.enabled boolean false if set, log verbose SQL session authentication events to the SESSIONS log channel (note: may hinder performance on loaded nodes). Session start and end events are always logged regardless of this setting; disable the SESSIONS log channel to suppress them. application
112
112
server.authentication_cache.enabled boolean true enables a cache used during authentication to avoid lookups to system tables when retrieving per-user authentication-related information application
113
113
server.child_metrics.enabled boolean false enables the exporting of child metrics, additional prometheus time series with extra labels application
114
+
server.child_metrics.include_aggregate.enabled boolean true include the reporting of the aggregate time series when child metrics are enabled. This cluster setting has no effect if child metrics are disabled. application
114
115
server.client_cert_expiration_cache.capacity integer 1000 the maximum number of client cert expirations stored application
115
116
server.clock.forward_jump_check.enabled (alias: server.clock.forward_jump_check_enabled) boolean false if enabled, forward clock jumps > max_offset/2 will cause a panic application
116
117
server.clock.persist_upper_bound_interval duration 0s the interval between persisting the wall time upper bound of the clock. The clock does not generate a wall time greater than the persisted timestamp and will panic if it sees a wall time greater than this value. When cockroach starts, it waits for the wall time to catch-up till this persisted timestamp. This guarantees monotonic wall time across server restarts. Not setting this or setting a value of 0 disables this feature. application
Copy file name to clipboardExpand all lines: docs/generated/settings/settings.html
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -141,6 +141,7 @@
141
141
<tr><td><divid="setting-server-auth-log-sql-sessions-enabled" class="anchored"><code>server.auth_log.sql_sessions.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>if set, log verbose SQL session authentication events to the SESSIONS log channel (note: may hinder performance on loaded nodes). Session start and end events are always logged regardless of this setting; disable the SESSIONS log channel to suppress them.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
142
142
<tr><td><divid="setting-server-authentication-cache-enabled" class="anchored"><code>server.authentication_cache.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>enables a cache used during authentication to avoid lookups to system tables when retrieving per-user authentication-related information</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
143
143
<tr><td><divid="setting-server-child-metrics-enabled" class="anchored"><code>server.child_metrics.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>enables the exporting of child metrics, additional prometheus time series with extra labels</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
144
+
<tr><td><divid="setting-server-child-metrics-include-aggregate-enabled" class="anchored"><code>server.child_metrics.include_aggregate.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>include the reporting of the aggregate time series when child metrics are enabled. This cluster setting has no effect if child metrics are disabled.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
144
145
<tr><td><divid="setting-server-client-cert-expiration-cache-capacity" class="anchored"><code>server.client_cert_expiration_cache.capacity</code></div></td><td>integer</td><td><code>1000</code></td><td>the maximum number of client cert expirations stored</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
145
146
<tr><td><divid="setting-server-clock-forward-jump-check-enabled" class="anchored"><code>server.clock.forward_jump_check.enabled<br/>(alias: server.clock.forward_jump_check_enabled)</code></div></td><td>boolean</td><td><code>false</code></td><td>if enabled, forward clock jumps > max_offset/2 will cause a panic</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
146
147
<tr><td><divid="setting-server-clock-persist-upper-bound-interval" class="anchored"><code>server.clock.persist_upper_bound_interval</code></div></td><td>duration</td><td><code>0s</code></td><td>the interval between persisting the wall time upper bound of the clock. The clock does not generate a wall time greater than the persisted timestamp and will panic if it sees a wall time greater than this value. When cockroach starts, it waits for the wall time to catch-up till this persisted timestamp. This guarantees monotonic wall time across server restarts. Not setting this or setting a value of 0 disables this feature.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
0 commit comments