Skip to content

Commit 932b3f1

Browse files
committed
design doc for postgres syntax
1 parent 2fa7a37 commit 932b3f1

File tree

1 file changed

+281
-0
lines changed

1 file changed

+281
-0
lines changed
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
# Postgres-style syntax for `EXPLAIN`
2+
3+
- Associated: https://github.com/MaterializeInc/database-issues/issues/8889
4+
5+
## The Problem
6+
7+
`EXPLAIN` is meant to help users understand how Materialize actually
8+
runs their queries. In the name of streamlining the education process,
9+
we should make our output as much like Postgres's as is practicable.
10+
11+
Changing `EXPLAIN` is tricky, though: we rely heavily on `EXPLAIN`'s
12+
completionist output to test our optimizer and debug queries. We must
13+
be careful to keep these tests while enabling the new behavior.
14+
15+
[https://github.com/MaterializeInc/materialize/pull/31185
16+
17+
## Success Criteria
18+
19+
Our default `EXPLAIN` output should be concise and in a format
20+
reminiscent of Postgres's. Ideally, `EXPLAIN` output should match the
21+
output in `mz_lir_mapping`.
22+
23+
## Out of Scope
24+
25+
We are not going to build new `EXPLAIN` infrastructure, diagrams,
26+
etc. For example, we are not going to attempt to differentiate between
27+
the different meanings of `ArrangeBy` in MIR.
28+
29+
We are not going to invent fundamentally new ways of explaining
30+
how Materialize works.
31+
32+
We are not going to do a user study in advance of any changes.
33+
34+
## Solution Proposal
35+
36+
Postgres explain plans have the format:
37+
38+
```
39+
Operator
40+
Detail
41+
-> Child Operator #1
42+
Detail
43+
...
44+
-> Child Operator #2
45+
Detail
46+
...
47+
```
48+
49+
To avoid bikeshedding, we should simply copy Postgres names whenever
50+
possible. When a have new concepts, we should aim to follow Postgres's
51+
norms: operator names spelled out with spaces, and properties are
52+
clearly elucidated in human-readable formats.
53+
54+
Postgres displays some parts of the query differently from us, namely:
55+
56+
- Column names:
57+
+ When a column name is available, it just gives the name (no number).
58+
+ When a column name is unavailable, it gives the number using `$2`.
59+
- `Map` and `Project` do not appear
60+
61+
## Minimal Viable Prototype
62+
63+
### TPC-H query 1
64+
65+
The query:
66+
67+
```sql
68+
SELECT
69+
l_returnflag,
70+
l_linestatus,
71+
sum(l_quantity) AS sum_qty,
72+
sum(l_extendedprice) AS sum_base_price,
73+
sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
74+
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
75+
avg(l_quantity) AS avg_qty,
76+
avg(l_extendedprice) AS avg_price,
77+
avg(l_discount) AS avg_disc,
78+
count(*) AS count_order
79+
FROM
80+
lineitem
81+
WHERE
82+
l_shipdate <= DATE '1998-12-01' - INTERVAL '60' day
83+
GROUP BY
84+
l_returnflag,
85+
l_linestatus
86+
ORDER BY
87+
l_returnflag,
88+
l_linestatus;
89+
```
90+
91+
Postgres `EXPLAIN`:
92+
93+
```
94+
GroupAggregate (cost=14.53..18.89 rows=67 width=248)
95+
Group Key: l_returnflag, l_linestatus
96+
-> Sort (cost=14.53..14.70 rows=67 width=88)
97+
Sort Key: l_returnflag, l_linestatus
98+
-> Seq Scan on lineitem (cost=0.00..12.50 rows=67 width=88)
99+
Filter: (l_shipdate <= '1998-10-02 00:00:00'::timestamp without time zone)
100+
(6 rows)
101+
```
102+
103+
Materialize `EXPLAIN`:
104+
105+
```
106+
Finish order_by=[#0{l_returnflag} asc nulls_last, #1{l_linestatus} asc nulls_last] output=[#0..=#9]
107+
Project (#0{l_returnflag}..=#5{sum}, #9..=#11, #6{count}) // { arity: 10 }
108+
Map (bigint_to_numeric(case when (#6{count} = 0) then null else #6{count} end), (#2{sum_l_quantity} / #8), (#3{sum_l_extendedprice} / #8), (#7{sum_l_discount} / #8)) // { arity: 12 }
109+
Reduce group_by=[#4{l_returnflag}, #5{l_linestatus}] aggregates=[sum(#0{l_quantity}), sum(#1{l_extendedprice}), sum((#1{l_extendedprice} * (1 - #2{l_discount}))), sum(((#1{l_extendedprice} * (1 - #2{l_discount})) * (1 + #3{l_tax}))), count(*), sum(#2{l_discount})] // { arity: 8 }
110+
Project (#4{l_quantity}..=#9{l_linestatus}) // { arity: 6 }
111+
Filter (date_to_timestamp(#10{l_shipdate}) <= 1998-10-02 00:00:00) // { arity: 16 }
112+
ReadIndex on=lineitem pk_lineitem_orderkey_linenumber=[*** full scan ***] // { arity: 16 }
113+
114+
Used Indexes:
115+
- materialize.public.pk_lineitem_orderkey_linenumber (*** full scan ***)
116+
117+
Target cluster: quickstart
118+
```
119+
120+
New Materialize `EXPLAIN`:
121+
122+
```
123+
Finish
124+
Order by: l_returnflag, l_linestatus
125+
-> Project // { arity: 10 }
126+
Columns: l_returnflag..=sum, #9..=#11, count
127+
-> Map // { arity: 12 }
128+
(bigint_to_numeric(case when (count = 0) then null else count end), (sum_l_quantity / #8), (sum_l_extendedprice / #8), (sum_l_discount / #8))
129+
-> Reduce // { arity: 8 }
130+
Group Key: l_returnflag, l_linestatus
131+
Aggregates: sum(l_quantity), sum(l_extendedprice), sum((l_extendedprice * (1 - l_discount))), sum(((l_extendedprice * (1 - l_discount)) * (1 + l_tax))), count(*), sum(l_discount)
132+
-> Project // { arity: 6 }
133+
Columns: l_quantity..=l_linestatus
134+
-> Filter // { arity: 16 }
135+
Predicates: date_to_timestamp(l_shipdate) <= 1998-10-02 00:00:00
136+
-> Index Scan using pk_lineitem_orderkey_linenumber on lineitem // { arity: 16 }
137+
138+
Used Indexes:
139+
- materialize.public.pk_lineitem_orderkey_linenumber (*** full scan ***)
140+
```
141+
142+
### TPC-H Query 3
143+
144+
The query:
145+
146+
```sql
147+
SELECT
148+
l_orderkey,
149+
sum(l_extendedprice * (1 - l_discount)) AS revenue,
150+
o_orderdate,
151+
o_shippriority
152+
FROM
153+
customer,
154+
orders,
155+
lineitem
156+
WHERE
157+
c_mktsegment = 'BUILDING'
158+
AND c_custkey = o_custkey
159+
AND l_orderkey = o_orderkey
160+
AND o_orderdate < DATE '1995-03-15'
161+
AND l_shipdate > DATE '1995-03-15'
162+
GROUP BY
163+
l_orderkey,
164+
o_orderdate,
165+
o_shippriority
166+
ORDER BY
167+
revenue DESC,
168+
o_orderdate;
169+
```
170+
171+
Postgres `EXPLAIN`:
172+
173+
```
174+
Sort (cost=20.78..20.79 rows=1 width=44)
175+
Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC, orders.o_orderdate
176+
-> GroupAggregate (cost=20.74..20.77 rows=1 width=44)
177+
Group Key: lineitem.l_orderkey, orders.o_orderdate, orders.o_shippriority
178+
-> Sort (cost=20.74..20.74 rows=1 width=48)
179+
Sort Key: lineitem.l_orderkey, orders.o_orderdate, orders.o_shippriority
180+
-> Nested Loop (cost=0.29..20.73 rows=1 width=48)
181+
-> Nested Loop (cost=0.14..19.93 rows=1 width=12)
182+
-> Seq Scan on customer (cost=0.00..11.75 rows=1 width=4)
183+
Filter: (c_mktsegment = 'BUILDING'::bpchar)
184+
-> Index Scan using fk_orders_custkey on orders (cost=0.14..8.16 rows=1 width=16)
185+
Index Cond: (o_custkey = customer.c_custkey)
186+
Filter: (o_orderdate < '1995-03-15'::date)
187+
-> Index Scan using fk_lineitem_orderkey on lineitem (cost=0.14..0.79 rows=1 width=40)
188+
Index Cond: (l_orderkey = orders.o_orderkey)
189+
Filter: (l_shipdate > '1995-03-15'::date)
190+
(16 rows)
191+
```
192+
193+
Materialize `EXPLAIN`:
194+
195+
```
196+
Finish order_by=[#1{sum} desc nulls_first, #2{o_orderdate} asc nulls_last] output=[#0..=#3]
197+
Project (#0{o_orderkey}, #3{sum}, #1{o_orderdate}, #2{o_shippriority}) // { arity: 4 }
198+
Reduce group_by=[#0{o_orderkey}..=#2{o_shippriority}] aggregates=[sum((#3{l_extendedprice} * (1 - #4{l_discount})))] // { arity: 4 }
199+
Project (#8{o_orderkey}, #12{o_orderdate}, #15{o_shippriority}, #22{l_extendedprice}, #23{l_discount}) // { arity: 5 }
200+
Filter (#6{c_mktsegment} = "BUILDING") AND (#12{o_orderdate} < 1995-03-15) AND (#27{l_shipdate} > 1995-03-15) // { arity: 33 }
201+
Join on=(#0{c_custkey} = #9{o_custkey} AND #8{o_orderkey} = #17{l_orderkey}) type=delta // { arity: 33 }
202+
implementation
203+
%0:customer » %1:orders[#1]KAif » %2:lineitem[#0]KAif
204+
%1:orders » %0:customer[#0]KAef » %2:lineitem[#0]KAif
205+
%2:lineitem » %1:orders[#0]KAif » %0:customer[#0]KAef
206+
ArrangeBy keys=[[#0{c_custkey}]] // { arity: 8 }
207+
ReadIndex on=customer pk_customer_custkey=[delta join 1st input (full scan)] // { arity: 8 }
208+
ArrangeBy keys=[[#0{o_orderkey}], [#1{o_custkey}]] // { arity: 9 }
209+
ReadIndex on=orders pk_orders_orderkey=[delta join lookup] fk_orders_custkey=[delta join lookup] // { arity: 9 }
210+
ArrangeBy keys=[[#0{l_orderkey}]] // { arity: 16 }
211+
ReadIndex on=lineitem fk_lineitem_orderkey=[delta join lookup] // { arity: 16 }
212+
213+
Used Indexes:
214+
- materialize.public.pk_customer_custkey (delta join 1st input (full scan))
215+
- materialize.public.pk_orders_orderkey (delta join lookup)
216+
- materialize.public.fk_orders_custkey (delta join lookup)
217+
- materialize.public.fk_lineitem_orderkey (delta join lookup)
218+
219+
Target cluster: quickstart
220+
```
221+
222+
New Materialize `EXPLAIN`:
223+
224+
```
225+
Finish
226+
Order by: sum desc nulls_first, o_orderdate
227+
-> Project // { arity: 4 }
228+
Columns: o_orderkey, sum, o_orderdate, o_shippriority
229+
-> Reduce // { arity: 4 }
230+
Group key: o_orderkey..=#2o_shippriority
231+
Aggregates: sum((l_extendedprice * (1 - l_discount)))
232+
-> Project // { arity: 5 }
233+
Columns: o_orderkey, o_orderdate, o_shippriority, l_extendedprice, l_discount
234+
-> Filter // { arity: 33 }
235+
Predicates: (c_mktsegment = "BUILDING") AND (o_orderdate < 1995-03-15) AND (l_shipdate > 1995-03-15)
236+
-> Delta Join // { arity: 33 }
237+
Conditions: c_custkey = o_custkey AND o_orderkey = l_orderkey
238+
Pipelines:
239+
%0:customer » %1:orders[#1]KAif » %2:lineitem[#0]KAif
240+
%1:orders » %0:customer[#0]KAef » %2:lineitem[#0]KAif
241+
%2:lineitem » %1:orders[#0]KAif » %0:customer[#0]KAef
242+
-> Arrangement // { arity: 8 }
243+
Keys: [c_custkey]
244+
-> Index Scan using pk_customer_custkey on customer // { arity: 8 }
245+
Delta join first input (full scan): pk_customer_custkey
246+
-> Arrangement // { arity: 9 }
247+
Keys: [o_orderkey], [o_custkey]
248+
-> Index Scan using pk_orders_orderkey, fk_orders_custkey on orders // { arity: 9 }
249+
Delta join lookup: pk_orders_orderkey, fk_orders_custkey
250+
-> Arrangement // { arity: 16 }
251+
Keys: [l_orderkey]
252+
-> Index Scan using fk_lineitem_orderkey on lineitem // { arity: 16 }
253+
Delta join lookup: fk_lineitem_orderkey
254+
255+
Used Indexes:
256+
- materialize.public.pk_customer_custkey (delta join 1st input (full scan))
257+
- materialize.public.pk_orders_orderkey (delta join lookup)
258+
- materialize.public.fk_orders_custkey (delta join lookup)
259+
- materialize.public.fk_lineitem_orderkey (delta join lookup)
260+
261+
Target cluster: quickstart
262+
```
263+
264+
265+
## Alternatives
266+
267+
Should we more radically reduce the AST?
268+
269+
Should we abandon static `EXPLAIN` and encourage `mz_lir_mapping` use?
270+
271+
## Open questions
272+
273+
Should we show `Project`?
274+
275+
Should we show _all_ expressions for `Map` and `Filter`?
276+
277+
How much of this data should `mz_lir_mapping` show?
278+
279+
There is a bug/infelicity in how `mz_lir_mapping` renders `Let`s and
280+
`LetRec`s. We'll need to fix `mz_lir_mapping` and do something cognate
281+
here: what is the best format to present this?

0 commit comments

Comments
 (0)