Record wallclock lag histogram#32010
Conversation
7d2926a to
c15122b
Compare
antiguru
left a comment
There was a problem hiding this comment.
I think this looks good, let me know when it's ready. I left some comments inline.
I'm wondering if we need both period_start and period_end. Ideally, the end of the previous period corresponds to the start of the current period.
| let bucket = lag.as_secs().next_power_of_two(); | ||
|
|
||
| let key = (histogram_period, bucket); | ||
| let key = (histogram_period, bucket, histogram_labels.clone()); |
There was a problem hiding this comment.
This clone isn't nice, but I can't think of an easy way to avoid it.
There was a problem hiding this comment.
The whole handling of labels in wallclock_lag_histogram_stash is not great. Label values are fixed to strings right now, which will stop working once we add other types (like bools). Ideally we'd store a Datum but it's impossible to store an owned String as a Datum. I've been considering storing a DatumMap instead of the BTreeMap, but I don't know how to construct one from its contents. Should we store a Row?
There was a problem hiding this comment.
Even a row would need to be cloned, so I think it's fine to leave as-is.
Where it can be useful is when we change the period length/interval. There will be some time where the measurements are glitchy because intervals overlap. Having both the start and end allows us to remove ambiguity in these cases. Though there are no specific plans to actually use the |
This commit adds a new built-in source, `mz_wallclock_global_lag_histogram_raw` that keeps 30 days worth of data by default. No data is filled in yet. The `_raw` suffix suggests that the histogram counts are stored as diffs for efficiency. This makes the histogram harder to query, so this commit also adds a view `mz_wallclock_lag_histogram` that lifts the count into a column.
This commit refactors the existing controller code to refresh wallclock lag introspection in preparation of the addition of a wallclock lag histogram. Mostly it renames things from "wallclock lag" to "wallclock lag history", to avoid confusion when we also add histogram updating logic there. It also changes the structure of the `refresh_wallclock_lag` methods a bit, to make them more similar between the two controllers.
6f2b59c to
308563a
Compare
This commit extends the two controllers' `refresh_wallclock_lag` methods to also record `WallclockLagHistogram` introspection data. Both the histogram refresh time (i.e. cadence of persistent writes) and the period length can be configured through dyncfg flags. Additionally, the commit adds a dyncfg flag to entirely disable histogram collection, as a failsafe.
This commit adds a "workload_class" label to measurements in `mz_wallclock_global_lag_histogram`. The value of the label is the workload class of the cluster maintaining the respective collection.
308563a to
64bdcdc
Compare
|
Should be ready to review now! Things I changed since @antiguru's last look:
|
64bdcdc to
b977003
Compare
b977003 to
f2dac5e
Compare
|
TFTR! |
This PR adds a new append-only storage-managed collection,
mz_wallclock_global_lag_histogram, and makes the controllers start populating it.The schema is:
period_starttimestampperiod_endtimestampobject_idtextlag_secondsuint8labelsjsonbcountuint8Histograms are per object and per time period, as specified by
object_idandperiod_*, respectively. The time period is configurable through a config flag and defaults to one day.The lag values are bucketed as powers-of-2 seconds. I decided to store seconds instead of milliseconds to (a) avoid the illusion of millisecond accuracy and (b) significantly reduce the number of buckets. I also omitted adding additional precision controls for now to reduce the complexity in this first version and because I suspect that powers of 2 might be sufficient. We can always iterate if it turns out we need more precision. The bucket values are upper bounds, mirroring what we also use for compute introspection histograms.
labelscurrently includes a single optionalworkload_classlabel, reflecting the workload class of the cluster maintaining the respective object, according tomz_cluster_workload_classes. I included this label specifically because it provides a simple way to filter out most of the noise when building a metric focused on production experience.Measurements are collected every second but only written once per a configurable refresh interval (one minute by default). This is to reduce the load on persist.
There is also a config flag to disable wallclock lag histogram recording entirely. This exists only to derisk the feature rollout and the expectation is that we'll remove it soon.
Motivation
See proposal in Notion.
Tips for reviewer
Individual commits are meaningful.
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.