Record wallclock lag histogram by teskje · Pull Request #32010 · MaterializeInc/materialize

teskje · 2025-03-25T17:57:27Z

This PR adds a new append-only storage-managed collection, mz_wallclock_global_lag_histogram, and makes the controllers start populating it.

The schema is:

period_start timestamp
period_end timestamp
object_id text
lag_seconds uint8
labels jsonb
count uint8

Histograms are per object and per time period, as specified by object_id and period_*, respectively. The time period is configurable through a config flag and defaults to one day.

The lag values are bucketed as powers-of-2 seconds. I decided to store seconds instead of milliseconds to (a) avoid the illusion of millisecond accuracy and (b) significantly reduce the number of buckets. I also omitted adding additional precision controls for now to reduce the complexity in this first version and because I suspect that powers of 2 might be sufficient. We can always iterate if it turns out we need more precision. The bucket values are upper bounds, mirroring what we also use for compute introspection histograms.

labels currently includes a single optional workload_class label, reflecting the workload class of the cluster maintaining the respective object, according to mz_cluster_workload_classes. I included this label specifically because it provides a simple way to filter out most of the noise when building a metric focused on production experience.

Measurements are collected every second but only written once per a configurable refresh interval (one minute by default). This is to reduce the load on persist.

There is also a config flag to disable wallclock lag histogram recording entirely. This exists only to derisk the feature rollout and the expectation is that we'll remove it soon.

Motivation

This PR adds a feature that has not yet been specified.

See proposal in Notion.

Tips for reviewer

Individual commits are meaningful.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

antiguru

I think this looks good, let me know when it's ready. I left some comments inline.

I'm wondering if we need both period_start and period_end. Ideally, the end of the previous period corresponds to the start of the current period.

src/controller-types/src/dyncfgs.rs

antiguru · 2025-03-27T09:31:36Z

src/compute-client/src/controller/instance.rs

                let bucket = lag.as_secs().next_power_of_two();

-                let key = (histogram_period, bucket);
+                let key = (histogram_period, bucket, histogram_labels.clone());


This clone isn't nice, but I can't think of an easy way to avoid it.

The whole handling of labels in wallclock_lag_histogram_stash is not great. Label values are fixed to strings right now, which will stop working once we add other types (like bools). Ideally we'd store a Datum but it's impossible to store an owned String as a Datum. I've been considering storing a DatumMap instead of the BTreeMap, but I don't know how to construct one from its contents. Should we store a Row?

Even a row would need to be cloned, so I think it's fine to leave as-is.

teskje · 2025-03-27T09:43:48Z

I'm wondering if we need both period_start and period_end. Ideally, the end of the previous period corresponds to the start of the current period.

Where it can be useful is when we change the period length/interval. There will be some time where the measurements are glitchy because intervals overlap. Having both the start and end allows us to remove ambiguity in these cases. Though there are no specific plans to actually use the period_end value right now it seems like a good idea to include it to future-proof things. I think once we have the period length locked in we can consider removing the period_end column.

This commit adds a new built-in source, `mz_wallclock_global_lag_histogram_raw` that keeps 30 days worth of data by default. No data is filled in yet. The `_raw` suffix suggests that the histogram counts are stored as diffs for efficiency. This makes the histogram harder to query, so this commit also adds a view `mz_wallclock_lag_histogram` that lifts the count into a column.

This commit refactors the existing controller code to refresh wallclock lag introspection in preparation of the addition of a wallclock lag histogram. Mostly it renames things from "wallclock lag" to "wallclock lag history", to avoid confusion when we also add histogram updating logic there. It also changes the structure of the `refresh_wallclock_lag` methods a bit, to make them more similar between the two controllers.

This commit extends the two controllers' `refresh_wallclock_lag` methods to also record `WallclockLagHistogram` introspection data. Both the histogram refresh time (i.e. cadence of persistent writes) and the period length can be configured through dyncfg flags. Additionally, the commit adds a dyncfg flag to entirely disable histogram collection, as a failsafe.

This commit adds a "workload_class" label to measurements in `mz_wallclock_global_lag_histogram`. The value of the label is the workload class of the cluster maintaining the respective collection.

teskje · 2025-03-27T11:28:54Z

Should be ready to review now! Things I changed since @antiguru's last look:

Made the period_* columns the first two columns, after @bkirwi's suggestion that this would make persist order them by time.
Renamed the source to mz_wallclock_global_lag_histogram_raw and added a mz_wallclock_global_lag_histogram view that lifts the count into a column. We need this for compatibility with the catalog extractor, which can't run transformations.
Renamed the wallclock_lag_histogram_period_length dyncfg to wallclock_lag_histogram_period_interval.
Added an enable_wallclock_lag_histogram_collection dyncfg to make it possible to switch of histogram collection in a pinch.
Added tests.

teskje · 2025-03-27T16:41:19Z

TFTR!

teskje force-pushed the mz_wallclock_lag_histogram branch 2 times, most recently from 7d2926a to c15122b Compare March 26, 2025 13:18

teskje changed the title ~~[wip] Record wallclock lag histogram~~ Record wallclock lag histogram Mar 27, 2025

antiguru reviewed Mar 27, 2025

View reviewed changes

teskje added 2 commits March 27, 2025 11:10

teskje force-pushed the mz_wallclock_lag_histogram branch 2 times, most recently from 6f2b59c to 308563a Compare March 27, 2025 10:33

teskje added 2 commits March 27, 2025 11:46

compute,storage: annotate wallclock lag histogram with workload_class

3cd5e8c

This commit adds a "workload_class" label to measurements in `mz_wallclock_global_lag_histogram`. The value of the label is the workload class of the cluster maintaining the respective collection.

teskje force-pushed the mz_wallclock_lag_histogram branch from 308563a to 64bdcdc Compare March 27, 2025 11:26

teskje marked this pull request as ready for review March 27, 2025 11:28

teskje requested review from a team as code owners March 27, 2025 11:28

teskje requested review from ParkMyCar and antiguru March 27, 2025 11:28

teskje force-pushed the mz_wallclock_lag_histogram branch from 64bdcdc to b977003 Compare March 27, 2025 11:47

antiguru approved these changes Mar 27, 2025

View reviewed changes

test: add tests for wallclock lag histogram

f2dac5e

teskje force-pushed the mz_wallclock_lag_histogram branch from b977003 to f2dac5e Compare March 27, 2025 13:24

teskje merged commit 771f1d8 into MaterializeInc:main Mar 27, 2025
84 checks passed

teskje deleted the mz_wallclock_lag_histogram branch March 27, 2025 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Record wallclock lag histogram#32010

Record wallclock lag histogram#32010
teskje merged 5 commits intoMaterializeInc:mainfrom
teskje:mz_wallclock_lag_histogram

teskje commented Mar 25, 2025 •

edited

Loading

Uh oh!

antiguru left a comment

Uh oh!

Uh oh!

antiguru Mar 27, 2025

Uh oh!

teskje Mar 27, 2025

Uh oh!

antiguru Mar 27, 2025

Uh oh!

teskje commented Mar 27, 2025

Uh oh!

teskje commented Mar 27, 2025

Uh oh!

teskje commented Mar 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

teskje commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tips for reviewer

Checklist

Uh oh!

antiguru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antiguru Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

antiguru Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

teskje commented Mar 27, 2025

Uh oh!

teskje commented Mar 27, 2025

Uh oh!

teskje commented Mar 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

teskje commented Mar 25, 2025 •

edited

Loading