Commit f0afff0
156307: sql: introduce canary stats settings r=ZhouXing19 a=ZhouXing19
Informs: #150015
This PR introduce 2 key configurations for the Canary Statistics Rollout feature. Note that this PR just to introduce the configuration settings. The core implementation for canary stats rollout will be in #156385.
### Table Storage parameter `sql_stats_canary_window` (duration)
```sql
CREATE TABLE t (x int) WITH (sql_stats_canary_window = '20s')
```
This duration value determines specifies how long the newly collected statistics will be eligible for selection along
with the most recent full statistics for the optimizer. It is needed for the canary statistics rollout feature. Only tables with a non-zero canary window will have canary statistics rollout enabled.
Release note (sql change): A new table storage parameter `sql_stats_canary_window` has been introduced to enable gradual rollout of newly collected table statistics. It takes a duration string as the value. When set with a non-negative duration, the new statistics remain in a "canary" state for the specified duration before being promoted to stable. This allows for controlled exposure and intervention opportunities before statistics are fully deployed across all queries.
----
### Cluster setting `sql.stats.canary_fraction` (float in [0 - 1])
```sql
SET CLUSTER SETTING sql.stats.canary_fraction = 0.2
```
This `canaryFraction` controls the probabilistic sampling rate for queries participating in the canary statistics rollout feature.
It determines what fraction of queries will use "canary statistics" (newly collected stats within their canary window) versus "stable statistics" (previously proven stats).
For example, a value of 0.2 means 20% of queries will test canary stats while 80% use stable stats.
The selection is atomic per query: if a query is chosen for canary evaluation, it will use canary statistics for ALL tables it references (where available). A query never uses a mix of canary and stable statistics.
Since this "dice roll" happens for every non-internal query, the memo would otherwise flip frequently, negating the benefits of the query plan cache and causing performance regressions. To mitigate this, queries selected for the canary path bypass the query plan cache entirely: they neither look up existing cached memos nor invalidate them. Instead, we
create a one-time memo used only for that single query execution.
This approach assumes sql.stats.canary_fraction will be set to a small value, ensuring that canary queries remain a small fraction of total queries and minimizing the performance impact of recomputation.
One exception is that, we don't roll the dice when preapring a statement. It means during statement preparation, `UseCanaryStats` is always false, so the memo cache remains enabled. The rule of thumb is: the cached memo, either in query cache or prepared stmt, are always for stable stats.
### Session Variable `canary_stats_mode` (enum: {auto, off, on})
- `on`: All queries in the session use canary stats for planning
- `off`: All queries in the session use stable stats for planning
- `auto`: The system decides based on `sql.stats.canary_fraction` for
each query execution
Release note (sql change): We introduce two new settings to control the use of canary statistics in query planning:
1. Cluster setting `sql.stats.canary_fraction` (float, range [0, 1]): Controls what fraction of queries use "canary statistics" (newly collected stats within their canary window) versus "stable statistics" (previously proven stats). For example, a value of 0.2 means 20% of queries will use canary stats while 80% use stable stats. The selection is atomic per query: if a query is chosen for canary evaluation, it uses canary statistics for ALL tables it references (where available), and it won't use query cache. A query never uses a mix of canary and stable statistics.
2. Session variable `canary_stats_mode` (enum: {auto, off, on}, default: auto):
- `on`: All queries in the session use canary stats for planning
- `off`: All queries in the session use stable stats for planning
- `auto`: The system decides based on `sql.stats.canary_fraction` for each query execution
157146: db-console: add metrics workspace to debug page r=xinhaoz a=xinhaoz
This debug page is similar to `Custom Time Series` but allows for exporting and loading of custom time series dashboards.
Epic: none
Release note: None
157862: decommission: retry on errors for AllocatorCheckRange r=wenyihu6 a=wenyihu6
Fixes: #156849
Release note: decommission pre-check may have failed on transient errors; this
is now fixed with a retry loop.
---
**decommission: retry on errors for AllocatorCheckRange**
Previously, the decommission pre-check would fail for a range if
evalStore.AllocatorCheckRange returned an error. However, transient errors, such
as throttled stores, are only expected to last about 5 seconds
(FailedReservationsTimeout) and can cause the pre-check to fail. This commit
adds a retry loop around AllocatorCheckRange to retry on any errors.
Alternatively, we could check for throttling errors specifically and retry only
on throttling stores, but that would require string or error comparisons, which
complicates the code. So we retry just on all errors here given this only
affects the decommission pre-check.
---
**kv: add TestDecommissionPreCheckRetryThrottledStores**
Previously, we made decommission prechecks retry on errors, since some transient
issues resolve quickly and shouldn’t cause the precheck to fail. This commit
adds a test that verifies the precheck retries when it encounters transient
throttled errors.
157927: roachtest: link on `large` pool r=rail a=rickystewart
Release note: none
Epic: none
Co-authored-by: ZhouXing19 <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: wenyihu6 <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
File tree
53 files changed
+1870
-177
lines changed- docs/generated/settings
- pkg
- ccl/logictestccl/tests
- 3node-tenant
- local-read-committed
- local-repeatable-read
- cli/testdata/doctor
- cmd/roachtest
- kv/kvserver
- server
- storage_api
- sql
- catalog
- bootstrap/testdata
- catpb
- descpb
- tabledesc
- logictest
- testdata/logic_test
- tests
- fakedist-disk
- fakedist-vec-off
- fakedist
- local-legacy-schema-changer
- local-mixed-25.2
- local-mixed-25.3
- local-mixed-25.4
- local-vec-off
- local
- opt/exec/execbuilder/testdata
- sem/eval
- sessiondatapb
- sessionmutator
- storageparam/tablestorageparam
- ui/workspaces/db-console/src
- redux
- views/reports/containers
- customChart
- metricsWorkspace
- components
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
53 files changed
+1870
-177
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
361 | 361 | | |
362 | 362 | | |
363 | 363 | | |
| 364 | + | |
364 | 365 | | |
365 | 366 | | |
366 | 367 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
316 | 316 | | |
317 | 317 | | |
318 | 318 | | |
| 319 | + | |
319 | 320 | | |
320 | 321 | | |
321 | 322 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
0 commit comments