Skip to content

Commit 09152d2

Browse files
authored
[explain] new syntax (LIR-based, Postgres like) (#32262)
Introduces new default syntax for `EXPLAIN`, such that now (1) `EXPLAIN` by default explains LIR plans, which have unambiguous interpretations (unlike MIR plans), and (2) `EXPLAIN` shows information in a Postgres-like syntax, and significantly less information than it used to for `EXPLAIN PHYSICAL PLAN FOR` (i.e., for LIR). You can still explain MIR plans with the old syntax using `EXPLAIN OPTIMIZED PLAN FOR`. You can still explain LIR plans with the old, very verbose syntax using `EXPLAIN PHYSICAL PLAN AS VERBOSE TEXT FOR`. Remaining TODOs: - [x] Update docs for `mz_lir_mapping` to describe new operator names. - [x] Write docs for new default syntax. - [x] Write test for new default syntax. - [x] Write changelog post. MaterializeInc/www#1457 Some remaining questions should be resolved by follow-up PRs, per conversation with @ggevay. MaterializeInc/database-issues#9375 MaterializeInc/database-issues#9376 MaterializeInc/database-issues#9377 ### Motivation * This PR adds a known-desirable feature. #31643 MaterializeInc/database-issues#8889
1 parent 25eb775 commit 09152d2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+3692
-988
lines changed

doc/user/content/sql/explain-analyze.md

Lines changed: 81 additions & 74 deletions
Large diffs are not rendered by default.

doc/user/content/sql/explain-plan.md

Lines changed: 52 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -96,12 +96,13 @@ FOR ] -- The FOR keyword is required if the PLAN keyword is specified
9696
{{</tab>}}
9797
{{</tabs>}}
9898

99-
Note that the `FOR` keyword is required if the `PLAN` keyword is present. In other words, the following three statements are equivalent:
99+
Note that the `FOR` keyword is required if the `PLAN` keyword is present. The following four statements are equivalent:
100100

101101
```mzsql
102102
EXPLAIN <explainee>;
103103
EXPLAIN PLAN FOR <explainee>;
104-
EXPLAIN OPTIMIZED PLAN FOR <explainee>;
104+
EXPLAIN PHYSICAL PLAN FOR <explainee>;
105+
EXPLAIN PHYSICAL PLAN AS TEXT FOR <explainee>;
105106
```
106107

107108
### Explained object
@@ -138,8 +139,8 @@ Plan Stage | Description
138139
**RAW PLAN** | Display the raw plan; this is closest to the original SQL.
139140
**DECORRELATED PLAN** | Display the decorrelated but not-yet-optimized plan.
140141
**LOCALLY OPTIMIZED** | Display the locally optimized plan (before view inlining and access path selection). This is the final stage for regular `CREATE VIEW` optimization.
141-
**OPTIMIZED PLAN** | _(Default)_ Display the optimized plan.
142-
**PHYSICAL PLAN** | Display the physical plan; this is close but not identical to the operators shown in [`mz_introspection.mz_lir_mapping`](../../sql/system-catalog/mz_introspection/#mz_lir_mapping).
142+
**OPTIMIZED PLAN** | Display the optimized plan.
143+
**PHYSICAL PLAN** | _(Default)_ Display the physical plan; this corresponds to the operators shown in [`mz_introspection.mz_lir_mapping`](../../sql/system-catalog/mz_introspection/#mz_lir_mapping).
143144

144145
### Output modifiers
145146

@@ -246,19 +247,38 @@ Materialize plans are directed, potentially cyclic, graphs of operators. Each op
246247
receives inputs from zero or more other operators and produces a single output.
247248
Sub-graphs where each output is consumed only once are rendered as tree-shaped fragments.
248249
Sub-graphs consumed more than once are represented as common table expressions (CTEs).
249-
In the example below, the CTE `l0` represents a linear sub-plan (a chain of `Get`,
250-
`Filter`, and `Project` operators) which is used in both inputs of a self-join.
250+
In the example below, the CTE `l0` represents a linear sub-plan (a chain of `Read` from the table `t`)
251+
which is used in both inputs of a self-join (`Differential Join`).
251252

252253
```text
253-
With
254-
cte l0 =
255-
Project (#0, #1)
256-
Filter (#0 > #2)
257-
ReadStorage materialize.public.t
258-
Return
259-
Join on=(#1 = #2)
260-
Get l0
261-
Get l0
254+
> CREATE TABLE t(x INT NOT NULL, y INT NOT NULL);
255+
CREATE TABLE
256+
> EXPLAIN SELECT t1.x, t1.y
257+
FROM (SELECT * FROM t WHERE x > y) AS t1,
258+
(SELECT * FROM t where x > y) AS t2
259+
WHERE t1.y = t2.y;
260+
Physical Plan
261+
--------------------------------------------------------
262+
Explained Query: +
263+
→With +
264+
cte l0 = +
265+
→Read materialize.public.t +
266+
→Return +
267+
→Differential Join %0 » %1 +
268+
Join stage %0: Lookup key #0{y} in %1 +
269+
→Arrange +
270+
Keys: 1 arrangement available, plus raw stream+
271+
Arrangement 0: #1{y} +
272+
→Stream l0 +
273+
→Arrange +
274+
Keys: 1 arrangement available, plus raw stream+
275+
Arrangement 0: #0{y} +
276+
→Read l0 +
277+
+
278+
Source materialize.public.t +
279+
filter=((#0{x} > #1{y})) +
280+
+
281+
Target cluster: quickstart +
262282
```
263283

264284
Note that CTEs in optimized plans do not directly correspond to CTEs in your original SQL query: For example, CTEs might disappear due to inlining (i.e., when a CTE is used only once, its definition is copied to that usage site); new CTEs can appear due to the optimizer recognizing that a part of the query appears more than once (aka common subexpression elimination). Also, certain SQL-level concepts, such as outer joins or subqueries, do not have an explicit representation in optimized plans, and are instead expressed as a pattern of operators involving CTEs. CTE names are always `l0`, `l1`, `l2`, ..., and do not correspond to SQL-level CTE names.
@@ -270,7 +290,7 @@ Many operators need to refer to columns in their input. These are displayed like
270290
columns assigned to `Map` operators, it might be useful to request [the `arity` output modifier](#output-modifiers).
271291

272292
Each operator can also be annotated with additional metadata. Details are shown
273-
by default in the `EXPLAIN PHYSICAL PLAN` output, but are hidden elsewhere. <a
293+
by default in the `EXPLAIN PHYSICAL PLAN` output (the default), but are hidden elsewhere. <a
274294
name="explain-with-join-implementations"></a>In `EXPLAIN OPTIMIZED
275295
PLAN`, details about the implementation in the `Join` operator can be requested
276296
with [the `join implementations` output modifier](#output-modifiers) (that is,
@@ -336,20 +356,28 @@ actually run).
336356

337357
{{< tabs >}}
338358

339-
{{< tab "In decorrelated and optimized plans (default EXPLAIN)" >}}
340-
{{< explain-plans/operator-table data="explain_plan_operators" planType="optimized" >}}
341-
{{< /tab >}}
342-
343359
{{< tab "In fully optimized physical (LIR) plans" >}}
344360
{{< explain-plans/operator-table data="explain_plan_operators" planType="LIR" >}}
345361
{{< /tab >}}
346362

363+
{{< tab "In decorrelated and optimized plans (default EXPLAIN)" >}}
364+
{{< explain-plans/operator-table data="explain_plan_operators" planType="optimized" >}}
365+
{{< /tab >}}
366+
347367
{{< tab "In raw plans" >}}
348368
{{< explain-plans/operator-table data="explain_plan_operators" planType="raw" >}}
349369
{{< /tab >}}
350370

351371
{{< /tabs >}}
352372

373+
Operators are sometimes marked as `Fused ...`. We write this to mean that the operator is fused with its input. That is, if you see a `Fused X` operator above a `Y` operator:
374+
375+
```
376+
→Fused X
377+
→Y
378+
```
379+
380+
Then the `X` and `Y` operators will be combined into a single, more efficient operator.
353381

354382
## Examples
355383

@@ -489,6 +517,10 @@ EXPLAIN PHYSICAL PLAN FOR
489517
MATERIALIZED VIEW my_mat_view;
490518
```
491519

520+
## Debugging running dataflows
521+
522+
The [`EXPLAIN ANALYZE`](/sql/explain-analyze/) statement will let you debug memory and cpu usage (optionally with information about worker skew) for existing indexes and materialized views in terms of their physical plan operators. It can also attribute [TopK hints](/transform-data/idiomatic-materialize-sql/top-k/#query-hints-1) to individual operators.
523+
492524
## Privileges
493525

494526
The privileges required to execute this statement are:

doc/user/content/transform-data/troubleshooting.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -279,10 +279,8 @@ A larger size cluster will provision more memory and CPU resources.
279279

280280
## Which part of my query runs slowly or uses a lot of memory?
281281

282-
{{< public-preview />}}
283-
284282
You can [`EXPLAIN`](/sql/explain-plan/) a query to see how it will be run as a
285-
dataflow. In particular, `EXPLAIN PHYSICAL PLAN` will show the concrete, fully
283+
dataflow. In particular, `EXPLAIN PHYSICAL PLAN` (the default) will show the concrete, fully
286284
optimized plan that Materialize will run. That plan is written in our "low-level
287285
intermediate representation" (LIR).
288286

0 commit comments

Comments
 (0)