Skip to content

Commit 8e201dc

Browse files
committed
metric: reinitialise sql child metrics on cluster settings change
This patch introduces two new cluster settings: `sql.metrics.application_name.enabled` and `sql.metrics.database_name.enabled`. These settings are used to configure `application_name` and `database` labels in sql metrics. To support these labels changes in subsequent metric export, sql metrics require ability to "reset" labels. This reset functionality needs to be added to the sql server because cluster settings and labels are scoped to the virtual cluster. We have extended existing metric registry functionality to reinitialise metrics on cluster change. The metric registry was ideal place because it tracks and manages metrics. Epic: CRDB-43153 Part of: CRDB-48253 Release note (sql change): New cluster settings `sql.metrics.application_name.enabled` and `sql.metrics.database_name.enabled` with default value of `false` can be set to true, to display application and database name on supported metrics respectively.
1 parent 70bb4fa commit 8e201dc

19 files changed

+233
-26
lines changed

docs/generated/settings/settings-for-tenants.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,8 @@ sql.log.slow_query.internal_queries.enabled boolean false when set to true, inte
323323
sql.log.slow_query.latency_threshold duration 0s when set to non-zero, log statements whose service latency exceeds the threshold to a secondary logger on each node application
324324
sql.log.user_audit string user/role-based audit logging configuration. An enterprise license is required for this cluster setting to take effect. application
325325
sql.log.user_audit.reduced_config.enabled boolean false enables logic to compute a reduced audit configuration, computing the audit configuration only once at session start instead of at each SQL event. The tradeoff with the increase in performance (~5%), is that changes to the audit configuration (user role memberships/cluster setting) are not reflected within session. Users will need to start a new session to see these changes in their auditing behaviour. application
326+
sql.metrics.application_name.enabled boolean false when enabled, SQL metrics would export application name as and additional label as part of child metrics. The number of unique label combinations is limited to 5000 by default. application
327+
sql.metrics.database_name.enabled boolean false when enabled, SQL metrics would export database name as and additional label as part of child metrics. The number of unique label combinations is limited to 5000 by default. application
326328
sql.metrics.index_usage_stats.enabled boolean true collect per index usage statistics application
327329
sql.metrics.max_mem_reported_stmt_fingerprints integer 100000 the maximum number of reported statement fingerprints stored in memory application
328330
sql.metrics.max_mem_reported_txn_fingerprints integer 100000 the maximum number of reported transaction fingerprints stored in memory application

docs/generated/settings/settings.html

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,8 @@
278278
<tr><td><div id="setting-sql-log-slow-query-latency-threshold" class="anchored"><code>sql.log.slow_query.latency_threshold</code></div></td><td>duration</td><td><code>0s</code></td><td>when set to non-zero, log statements whose service latency exceeds the threshold to a secondary logger on each node</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
279279
<tr><td><div id="setting-sql-log-user-audit" class="anchored"><code>sql.log.user_audit</code></div></td><td>string</td><td><code></code></td><td>user/role-based audit logging configuration. An enterprise license is required for this cluster setting to take effect.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
280280
<tr><td><div id="setting-sql-log-user-audit-reduced-config-enabled" class="anchored"><code>sql.log.user_audit.reduced_config.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>enables logic to compute a reduced audit configuration, computing the audit configuration only once at session start instead of at each SQL event. The tradeoff with the increase in performance (~5%), is that changes to the audit configuration (user role memberships/cluster setting) are not reflected within session. Users will need to start a new session to see these changes in their auditing behaviour.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
281+
<tr><td><div id="setting-sql-metrics-application-name-enabled" class="anchored"><code>sql.metrics.application_name.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>when enabled, SQL metrics would export application name as and additional label as part of child metrics. The number of unique label combinations is limited to 5000 by default.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
282+
<tr><td><div id="setting-sql-metrics-database-name-enabled" class="anchored"><code>sql.metrics.database_name.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>when enabled, SQL metrics would export database name as and additional label as part of child metrics. The number of unique label combinations is limited to 5000 by default.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
281283
<tr><td><div id="setting-sql-metrics-index-usage-stats-enabled" class="anchored"><code>sql.metrics.index_usage_stats.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>collect per index usage statistics</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
282284
<tr><td><div id="setting-sql-metrics-max-mem-reported-stmt-fingerprints" class="anchored"><code>sql.metrics.max_mem_reported_stmt_fingerprints</code></div></td><td>integer</td><td><code>100000</code></td><td>the maximum number of reported statement fingerprints stored in memory</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
283285
<tr><td><div id="setting-sql-metrics-max-mem-reported-txn-fingerprints" class="anchored"><code>sql.metrics.max_mem_reported_txn_fingerprints</code></div></td><td>integer</td><td><code>100000</code></td><td>the maximum number of reported transaction fingerprints stored in memory</td><td>Serverless/Dedicated/Self-Hosted</td></tr>

pkg/server/server_sql.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1384,6 +1384,14 @@ func newSQLServer(ctx context.Context, cfg sqlServerArgs) (*SQLServer, error) {
13841384
vmoduleSetting.SetOnChange(&cfg.Settings.SV, fn)
13851385
fn(ctx)
13861386

1387+
metric.AppNameLabelEnabled.SetOnChange(&cfg.Settings.SV, func(ctx context.Context) {
1388+
pgServer.SQLServer.UpdateLabelValueConfig(&cfg.Settings.SV, cfg.registry)
1389+
})
1390+
1391+
metric.DBNameLabelEnabled.SetOnChange(&cfg.Settings.SV, func(ctx context.Context) {
1392+
pgServer.SQLServer.UpdateLabelValueConfig(&cfg.Settings.SV, cfg.registry)
1393+
})
1394+
13871395
auditlogging.ConfigureRoleBasedAuditClusterSettings(ctx, execCfg.AuditConfig, execCfg.Settings, &execCfg.Settings.SV)
13881396

13891397
return &SQLServer{

pkg/sql/conn_executor.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -850,6 +850,11 @@ func (s *Server) GetBytesMonitor() *mon.BytesMonitor {
850850
return s.pool
851851
}
852852

853+
func (s *Server) UpdateLabelValueConfig(sv *settings.Values, registry *metric.Registry) {
854+
registry.ReinitialiseChildMetrics(metric.DBNameLabelEnabled.Get(sv),
855+
metric.AppNameLabelEnabled.Get(sv))
856+
}
857+
853858
// SetupConn creates a connExecutor for the client connection.
854859
//
855860
// When this method returns there are no resources allocated yet that

pkg/util/metric/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ go_library(
2222
importpath = "github.com/cockroachdb/cockroach/pkg/util/metric",
2323
visibility = ["//visibility:public"],
2424
deps = [
25+
"//pkg/settings",
2526
"//pkg/util/buildutil",
2627
"//pkg/util/envutil",
2728
"//pkg/util/log",

pkg/util/metric/aggmetric/agg_metric.go

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,6 @@ const (
2828
appLabel = "application_name"
2929
)
3030

31-
const (
32-
LabelConfigDisabled = iota
33-
LabelConfigApp
34-
LabelConfigDB
35-
LabelConfigAppAndDB
36-
)
37-
3831
// Builder is used to ease constructing metrics with the same labels.
3932
type Builder struct {
4033
labels []string
@@ -189,9 +182,9 @@ type SQLMetric struct {
189182
}
190183
}
191184

192-
func NewSQLMetric(labelConfig uint64) *SQLMetric {
185+
func NewSQLMetric(labelConfig metric.LabelConfig) *SQLMetric {
193186
sm := &SQLMetric{}
194-
sm.labelConfig.Store(labelConfig)
187+
sm.labelConfig.Store(uint64(labelConfig))
195188
sm.mu.children = &UnorderedCacheWrapper{
196189
cache: getCacheStorage(),
197190
}
@@ -213,17 +206,17 @@ func (sm *SQLMetric) Each(
213206
dbLabel := dbLabel
214207
appLabel := appLabel
215208
switch sm.labelConfig.Load() {
216-
case LabelConfigDB:
209+
case uint64(metric.LabelConfigDB):
217210
childLabels = append(childLabels, &io_prometheus_client.LabelPair{
218211
Name: &dbLabel,
219212
Value: &lvs[0],
220213
})
221-
case LabelConfigApp:
214+
case uint64(metric.LabelConfigApp):
222215
childLabels = append(childLabels, &io_prometheus_client.LabelPair{
223216
Name: &appLabel,
224217
Value: &lvs[0],
225218
})
226-
case LabelConfigAppAndDB:
219+
case uint64(metric.LabelConfigAppAndDB):
227220
childLabels = append(childLabels, &io_prometheus_client.LabelPair{
228221
Name: &dbLabel,
229222
Value: &lvs[0],
@@ -275,20 +268,29 @@ func (sm *SQLMetric) getChildByLabelConfig(
275268
) (ChildMetric, bool) {
276269
var childMetric ChildMetric
277270
switch sm.labelConfig.Load() {
278-
case LabelConfigDB:
271+
case uint64(metric.LabelConfigDB):
279272
childMetric = sm.getOrAddChild(f, db)
280273
return childMetric, true
281-
case LabelConfigApp:
274+
case uint64(metric.LabelConfigApp):
282275
childMetric = sm.getOrAddChild(f, app)
283276
return childMetric, true
284-
case LabelConfigAppAndDB:
277+
case uint64(metric.LabelConfigAppAndDB):
285278
childMetric = sm.getOrAddChild(f, db, app)
286279
return childMetric, true
287280
default:
288281
return nil, false
289282
}
290283
}
291284

285+
// ReinitialiseChildMetrics clears the child metrics and
286+
// sets the label configuration.
287+
func (sm *SQLMetric) ReinitialiseChildMetrics(labelConfig metric.LabelConfig) {
288+
sm.mu.Lock()
289+
defer sm.mu.Unlock()
290+
sm.mu.children.Clear()
291+
sm.labelConfig.Store(uint64(labelConfig))
292+
}
293+
292294
type MetricItem interface {
293295
labelValuer
294296
}

pkg/util/metric/aggmetric/agg_metric_test.go

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,7 @@ func TestAggMetricClear(t *testing.T) {
297297
Name: "bar_counter",
298298
})
299299
r.AddMetric(d)
300-
d.labelConfig.Store(LabelConfigAppAndDB)
300+
d.labelConfig.Store(uint64(metric.LabelConfigAppAndDB))
301301
tenant2 := roachpb.MustMakeTenantID(2)
302302
c1 := c.AddChild(tenant2.String())
303303

@@ -368,3 +368,53 @@ func WritePrometheusMetricsFunc(r *metric.Registry) func(t *testing.T) string {
368368
}
369369
return writePrometheusMetrics
370370
}
371+
372+
func TestSQLMetricsReinitialise(t *testing.T) {
373+
defer leaktest.AfterTest(t)()
374+
r := metric.NewRegistry()
375+
writePrometheusMetrics := WritePrometheusMetricsFunc(r)
376+
377+
counter := NewSQLCounter(metric.Metadata{Name: "test.counter"})
378+
r.AddMetric(counter)
379+
380+
gauge := NewSQLGauge(metric.Metadata{Name: "test.gauge"})
381+
r.AddMetric(gauge)
382+
383+
histogram := NewSQLHistogram(metric.HistogramOptions{
384+
Metadata: metric.Metadata{
385+
Name: "test.histogram",
386+
},
387+
Duration: base.DefaultHistogramWindowInterval(),
388+
MaxVal: 100,
389+
SigFigs: 1,
390+
BucketConfig: metric.Percent100Buckets,
391+
})
392+
r.AddMetric(histogram)
393+
394+
t.Run("before invoking reinitialise sql metrics", func(t *testing.T) {
395+
counter.Inc(1, "test_db", "test_app")
396+
gauge.Update(10, "test_db", "test_app")
397+
histogram.RecordValue(10, "test_db", "test_app")
398+
399+
testFile := "sql_metric_pre_reinitialise_child_metrics.txt"
400+
if metric.HdrEnabled() {
401+
testFile = "sql_metric_pre_reinitialise_child_metrics_hdr.txt"
402+
}
403+
echotest.Require(t, writePrometheusMetrics(t), datapathutils.TestDataPath(t, testFile))
404+
})
405+
406+
r.ReinitialiseChildMetrics(true, true)
407+
408+
t.Run("after invoking reinitialise sql metrics", func(t *testing.T) {
409+
counter.Inc(1, "test_db", "test_app")
410+
gauge.Update(10, "test_db", "test_app")
411+
histogram.RecordValue(10, "test_db", "test_app")
412+
413+
testFile := "sql_metric_post_reinitialise_child_metrics.txt"
414+
if metric.HdrEnabled() {
415+
testFile = "sql_metric_post_reinitialise_child_metrics_hdr.txt"
416+
}
417+
echotest.Require(t, writePrometheusMetrics(t), datapathutils.TestDataPath(t, testFile))
418+
})
419+
420+
}

pkg/util/metric/aggmetric/counter.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -256,15 +256,15 @@ type SQLCounter struct {
256256
}
257257

258258
var _ metric.Iterable = (*SQLCounter)(nil)
259-
var _ metric.PrometheusIterable = (*SQLCounter)(nil)
259+
var _ metric.PrometheusReinitialisable = (*SQLCounter)(nil)
260260
var _ metric.PrometheusExportable = (*SQLCounter)(nil)
261261

262262
// NewSQLCounter constructs a new SQLCounter.
263263
func NewSQLCounter(metadata metric.Metadata) *SQLCounter {
264264
c := &SQLCounter{
265265
g: *metric.NewCounter(metadata),
266266
}
267-
c.SQLMetric = NewSQLMetric(LabelConfigDisabled)
267+
c.SQLMetric = NewSQLMetric(metric.LabelConfigDisabled)
268268
return c
269269
}
270270

pkg/util/metric/aggmetric/counter_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ func TestAggCounter(t *testing.T) {
4646
c := NewSQLCounter(metric.Metadata{
4747
Name: "foo_counter",
4848
})
49-
c.labelConfig.Store(LabelConfigAppAndDB)
49+
c.labelConfig.Store(uint64(metric.LabelConfigAppAndDB))
5050
r.AddMetric(c)
5151
cacheStorage := cache.NewUnorderedCache(cache.Config{
5252
Policy: cache.CacheLRU,

pkg/util/metric/aggmetric/gauge.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -371,14 +371,14 @@ type SQLGauge struct {
371371
}
372372

373373
var _ metric.Iterable = (*SQLGauge)(nil)
374-
var _ metric.PrometheusIterable = (*SQLGauge)(nil)
374+
var _ metric.PrometheusReinitialisable = (*SQLGauge)(nil)
375375
var _ metric.PrometheusExportable = (*SQLGauge)(nil)
376376

377377
func NewSQLGauge(metadata metric.Metadata) *SQLGauge {
378378
g := &SQLGauge{
379379
g: *metric.NewGauge(metadata),
380380
}
381-
g.SQLMetric = NewSQLMetric(LabelConfigDisabled)
381+
g.SQLMetric = NewSQLMetric(metric.LabelConfigDisabled)
382382
return g
383383
}
384384

0 commit comments

Comments
 (0)