You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
105477: sql: add plan gist matching to stmt diagnostics feature r=yuzefovich a=yuzefovich
This commit extends the stmt diagnostics feature to add optional
plan-gist-based matching. Previously, we filtered statements based only
on the fingerprint but now we can optionally ask for a particular plan
(by specifying the target plan gist). All other aspects of the feature
(minimum execution latency, sampling probability) are unaffected.
The caveat to the implementation is that the plan gist of the running
statement is available after the optimizer has done its part, so
whenever plan-gist-based matching is desired, the trace will not
include the optimizer part as well as the plan string won't be
available.
This commit also made a minor change to always store the memo and the
opt planning catalog in `planTop`. Previously, this was stored only when
the bundle collection is enabled, but we now can enable it after the
optimizer, at which point the memo and the catalog might be lost. The
optimizer now stores it unconditionally, but then if we choose to not
collect the bundle once the plan gist is available, we release these
things. This allows us to still get `opt` files in the bundle.
Epic: None
Addresses: cockroachdb#96765.
Addresses: cockroachdb#103018.
Release note (sql change): Statement diagnostics feature has been
extended to support collecting a bundle for a particular plan. Namely,
the existing fingerprint-based matching has been extended to also
include plan-gist-based matching. Such bundle will miss a couple of
things: `plan.txt` file as well as the tracing of the optimizer. At
the moment, the feature is only exposed via an overload to
`crdb_internal.request_statement_bundle` builtin function. We now also
support "anti-match" - i.e. collecting a bundle for any plan other than
the provided plan gist.
108139: sql: fix logic to collect stats on system.jobs r=rytaft a=rytaft
This commit fixes an oversight in cockroachdb#102637 which intended to enable stats collection on the jobs table, but was not successful.
I've manually confirmed that stats are now collected on the jobs table in a local cluster:
```
888074673664065537 | AUTO CREATE STATS | Table statistics refresh for system.public.jobs | CREATE STATISTICS __auto__ FROM [15] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2023-08-03 19:01:22.343
```
Informs cockroachdb#107405
Release note (performance improvement): We now automatically collect table statistics on the `system.jobs` table, which will enable the optimizer to produce better query plans for internal queries that access the `system.jobs` table. This may result in better performance of the system. Note: a previous attempt to enable stats on `system.jobs` starting in 23.1.0 was unsuccessful, but we have now fixed the oversight.
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
Copy file name to clipboardExpand all lines: docs/generated/http/full.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4388,6 +4388,8 @@ Support status: [reserved](#support-status)
4388
4388
| min_execution_latency |[google.protobuf.Duration](#cockroach.server.serverpb.CreateStatementDiagnosticsReportRequest-google.protobuf.Duration)|| MinExecutionLatency, when non-zero, indicates the minimum execution latency of a query for which to collect the diagnostics report. In other words, if a query executes faster than this threshold, then the diagnostics report is not collected on it, and we will try to get a bundle the next time we see the query fingerprint.<br><br>NB: if MinExecutionLatency is non-zero, then all queries that match the fingerprint will be traced until a slow enough query comes along. This tracing might have some performance overhead. |[reserved](#support-status)|
4389
4389
| expires_after |[google.protobuf.Duration](#cockroach.server.serverpb.CreateStatementDiagnosticsReportRequest-google.protobuf.Duration)|| ExpiresAfter, when non-zero, sets the expiration interval of this request. |[reserved](#support-status)|
4390
4390
| sampling_probability | [double](#cockroach.server.serverpb.CreateStatementDiagnosticsReportRequest-double) | | SamplingProbability controls how likely we are to try and collect a diagnostics report for a given execution. The semantics with MinExecutionLatency are worth noting (and perhaps simplifying?): - If SamplingProbability is zero, we're always sampling. This is for compatibility with pre-22.2 versions where this parameter was not available. - If SamplingProbability is non-zero, MinExecutionLatency must be non-zero. We'll sample stmt executions with the given probability until: (a) we capture one that exceeds MinExecutionLatency, or (b) we hit the ExpiresAfter point.<br><br>SamplingProbability lets users control at a per-stmt granularity how much collection overhead is acceptable to try an capture an outlier execution for further analysis (are high p99.9s due to latch waits? racing with split transfers?). A high sampling rate can capture a trace sooner, but the added overhead may also cause the trace to be non-representative if the tracing overhead across all requests is causing resource saturation (network, memory) and resulting in slowdown.<br><br>TODO(irfansharif): Wire this up to the UI code. When selecting the latency threshold, we should want to force specifying a sampling probability.<br><br>TODO(irfansharif): We could do better than a hard-coded default value for probability (100% could be too high-overhead so probably not the right one). Strawman: could consider the recent request rate for the fingerprint (say averaged over the last 10m? 30m?), consider what %-ile the latency target we're looking to capture is under, and suggest a sampling probability that gets you at least one trace in the next T seconds with 95% likelihood? Or provide a hint for how long T is for the currently chosen sampling probability. | [reserved](#support-status) |
4391
+
| plan_gist |[string](#cockroach.server.serverpb.CreateStatementDiagnosticsReportRequest-string)|| PlanGist, when set, indicates a particular plan that we want collect diagnostics for. This can be useful when a single fingerprint can result in multiple plans.<br><br>There is a caveat to using this filtering: since the plan gist for a running query is only available after the optimizer has done its part, the trace will only include things after the optimizer is done. |[reserved](#support-status)|
4392
+
| anti_plan_gist |[bool](#cockroach.server.serverpb.CreateStatementDiagnosticsReportRequest-bool)|| AntiPlanGist, when set, indicates that any plan not matching PlanGist will do. |[reserved](#support-status)|
Copy file name to clipboardExpand all lines: docs/generated/settings/settings-for-tenants.txt
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -304,4 +304,4 @@ trace.opentelemetry.collector string address of an OpenTelemetry trace collecto
304
304
trace.snapshot.rate duration 0s if non-zero, interval at which background trace snapshots are captured tenant-rw
305
305
trace.span_registry.enabled boolean true if set, ongoing traces can be seen at https://<ui>/#/debug/tracez tenant-rw
306
306
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used. tenant-rw
307
-
version version 1000023.1-16 set the active cluster version in the format '<major>.<minor>' tenant-rw
307
+
version version 1000023.1-18 set the active cluster version in the format '<major>.<minor>' tenant-rw
Copy file name to clipboardExpand all lines: docs/generated/settings/settings.html
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -260,6 +260,6 @@
260
260
<tr><td><divid="setting-trace-span-registry-enabled" class="anchored"><code>trace.span_registry.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://<ui>/#/debug/tracez</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
261
261
<tr><td><divid="setting-trace-zipkin-collector" class="anchored"><code>trace.zipkin.collector</code></div></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
262
262
<tr><td><divid="setting-ui-display-timezone" class="anchored"><code>ui.display_timezone</code></div></td><td>enumeration</td><td><code>etc/utc</code></td><td>the timezone used to format timestamps in the ui [etc/utc = 0, america/new_york = 1]</td><td>Dedicated/Self-Hosted</td></tr>
263
-
<tr><td><divid="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000023.1-16</code></td><td>set the active cluster version in the format '<major>.<minor>'</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
263
+
<tr><td><divid="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000023.1-18</code></td><td>set the active cluster version in the format '<major>.<minor>'</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
0 commit comments