Skip to content

Commit e075d8f

Browse files
committed
first draft fully merge group
1 parent 194ae5e commit e075d8f

File tree

8 files changed

+650
-121
lines changed

8 files changed

+650
-121
lines changed

optd-mvp/DESIGN.md

Lines changed: 168 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
# Duplicate Elimination Memo Table
22

3+
_Connor Tsui, December 2024_
4+
35
Note that most of the details are in `src/memo/persistent/implementation.rs`.
46

5-
For this document, we are assuming that the memo table is backed by a database / ORM. A lot of these
6-
problems would likely not be an issue if everything was in memory.
7+
For this document, we are assuming that the memo table is backed by a database / ORM. Both the
8+
problems and the features detailed in this document are unique to this design, and likely do not
9+
apply to an in-memory memo table.
710

811
## Group Merging
912

@@ -12,41 +15,44 @@ for this is to immediately merge two groups together when the engine determines
1215
expression would result in a duplicate expression from another group.
1316

1417
However, if we want to support parallel exploration, this could be prone to high contention. By
15-
definition, merging group G1 into group G2 would mean that _every expression_ that has a child of
16-
group G1 with would need to be rewritten to point to group G2 instead.
18+
definition, merging group 1 into group 2 would mean that _every expression_ that has a child of
19+
group 1 with would need to be rewritten to point to group 2 instead.
1720

18-
This is unacceptable in a parallel setting, as that would mean every single task that gets affected
19-
would need to either wait for the rewrites to happen before resuming work, or need to abort their
20-
work because data has changed underneath them.
21+
This is prohibitive in a parallel setting, as that would mean every single task that gets affected
22+
would need to either wait for the rewrites to happen before resuming work, or potentially need to
23+
abort their work because data has changed underneath them.
2124

22-
So immediate / eager group merging is not a great idea for parallel exploration. However, if we do
23-
not ever merge two groups that are identical, we are subject to doing duplicate work for every
25+
So immediate / eager group merging is not a great idea for parallel exploration. However, if we
26+
don't merge two groups that are equivalent, we are subject to doing duplicate work for every
2427
duplicate expression in the memo table during physical optimization.
2528

2629
Instead of merging groups together immediately, we can instead maintain an auxiliary data structure
2730
that records the groups that _eventually_ need to get merged, and "lazily" merge those groups
28-
together once every group has finished exploration.
31+
together once every group has finished exploration. We will refer to merging groups as the act of
32+
recording that the groups should eventually be merged together after exploration is finished.
2933

3034
## Union-Find Group Sets
3135

3236
We use the well-known Union-Find algorithm and corresponding data structure as the auxiliary data
3337
structure that tracks the to-be-merged groups.
3438

3539
Union-Find supports `Union` and `Find` operations, where `Union` merges sets and `Find` searches for
36-
a "canonical" or "root" element that is shared between all elements in a given set.
40+
a "canonical" or "root" element that is shared between all elements in a given set. Note that we
41+
will also support an iteration operation that iterates over all elements in a given set. We will
42+
need this for [duplicate detection](#fingerprinting--group-merge), which is explained below.
3743

3844
For more information about Union-Find, see these
39-
[15-451 lecture notes](https://www.cs.cmu.edu/~15451-f24/lectures/lecture08-union-find.pdf).
45+
[15-451 lecture notes](https://www.cs.cmu.edu/~15451-f24/lectures/lecture08-union-find.pdf). We will
46+
use the exact same data structure, but add an additional `next` pointer for each node that embeds
47+
a circular linked list for each set.
4048

41-
Here, we make the elements the groups themselves (really the Group IDs), which allows us to merge
49+
Here, we make the elements the groups themselves (really the group IDs), which allows us to merge
4250
group sets together and also determine a "root group" that all groups in a set can agree on.
4351

4452
When every group in a group set has finished exploration, we can safely begin to merge them
4553
together by moving all expressions from every group in the group set into a single large group.
4654
Other than making sure that any reference to an old group in the group set points to this new large
47-
group, exploration of all groups are done and physical optimization can start.
48-
49-
RFC: Do we need to support incremental search?
55+
group, exploration of all groups is done and physical optimization can start.
5056

5157
Note that since we are now waiting for exploration of all groups to finish, this algorithm is much
5258
closer to the Volcano framework than the Cascades' incremental search. However, since we eventually
@@ -56,14 +62,153 @@ of a problem.
5662

5763
## Duplicate Detection
5864

59-
TODO explain the fingerprinting algorithm and how it relates to group merging
65+
Deciding that we will merge groups lazily does not solve all of our problems. We have to know _when_
66+
we want to merge these groups.
6067

61-
Union find data structure with a circular linked list for linear iteration
68+
A naive approach is to simply loop over every expression in the memo table and check if we are about
69+
to insert a duplicate. This, of course, is bad for performance.
6270

63-
When taking the fingerprint of an expression, the child groups of an expression need to be root groups. If they are not, we need to try again.
64-
Assuming that all children are root groups, the fingerprint we make for any expression that fulfills that is valid and can be looked up for duplicates.
65-
In order to maintain that correctness, on a merge of two sets, the smaller one requires that a new fingerprint be generated for every expression that has a group in that smaller set.
66-
For example, let's say we need to merge { 1, 2 } (root group 1) with { 3, 4, 5, 6, 7, 8 } (root group 3). We need to find every single expression that has a child group of 1 or 2 and we need to generate a new fingerprint for each where the child groups have been "rewritten" to 3
71+
We will use a fingerprinting / hashing method to detect when a duplicate expression might be
72+
inserted into the memo table (returning an error instead of inserting), and we will use that to
73+
trigger group merges.
6774

68-
TODO this is incredibly expensive, but is potentially easily parallelizable?
75+
For every logical expression we insert into the memo table, we will create a fingerprint that
76+
contains both the kind of expression / relation (Scan, Filter, Join) and a hash of all
77+
information that makes that expression unique. For example:
6978

79+
- The fingerprint of a Scan should probably contain a hash of the table name and the pushdown
80+
predicate.
81+
- The fingerprint of a Filter should probably contain a hash of its child group ID and predicate.
82+
- The fingerprint of a Join should probably contain a hash of the left group ID and the right group
83+
ID, as well as the join predicate.
84+
85+
Note that the above descriptions are slightly inaccurate, and we'll explain why in a later
86+
[section](#fingerprinting--group-merge).
87+
88+
Also, if we have duplicate detection for logical expression, and we do not start physical
89+
optimization until after full plan enumeration, then we do not actually need to do duplicate
90+
detection of physical expressions, since they are derivative of the deduplicated logical
91+
expressions.
92+
93+
### Fingerprint Matching Algorithm
94+
95+
When an expression is added to the memo table, it will first calculate the fingerprint of the
96+
expression. The memo table will compare this fingerprint with every fingerprint in the memo table to
97+
check if we have seen this expression before (in any group). While this is effectively a scan
98+
through every expression, supporting the fingerprint table with an B+tree index will speed up this
99+
operation dramatically (since these fingerprints can be sorted by expression / relation kind).
100+
101+
If there are no identical fingerprints, then there is no duplicate expression, and we can safely
102+
add the expression into the memo table. However, if there are matching fingerprints, we need to
103+
further check for false positives due to hash collisions.
104+
105+
We do full exact match equality checks with every expression that had a fingerprint match. If there
106+
are no exact matches, then we can safely add the expression into the memo table. However, if we find
107+
an exact match (note that there can be at most one exact match since we have an invariant that there
108+
cannot be duplicate expressions), then we know that the expression we are trying to add already
109+
exists in the memo table.
110+
111+
### Fingerprinting + Group Merge
112+
113+
There is a slight problem with the algorithm described above. It does not account for when a child
114+
group has merged into another group.
115+
116+
For example, let's say we have groups 1, 2, and 3. We insert an expression Join(1, 2) into the
117+
memo table with its fingerprint calculated with groups 1 and 2. It is possible that we find out that
118+
groups 2 and 3 need to merged. This means that Join(1, 2) and Join (1, 3) are actually identical
119+
expressions, and the fingerprinting strategies for expressions described above do not handle this.
120+
121+
We will solve this problem by adding allowing multiple fingerprints to reference the same logical
122+
expression, and we will generate a new fingerprint for every expression that is affected by a group
123+
merge / every expression who's parent group now has a new root group.
124+
125+
In the above scenario, we will find every expression in the memo table that has group 2 as a child.
126+
For each expression, we will generate another fingerprint with group 2 "rewritten" as group 3 in the
127+
hash. Note that we _do not_ modify the original expression, we are simply adding another fingerprint
128+
into the memo table.
129+
130+
Finally, we need to handle when multiple groups in a group set are merged into another group set.
131+
For example, if a left group set { 1, 2, 3, 4, 5 } with root 1 needs to be merged into a right group
132+
set { 6, 7, 8, 9, 10 } with root 6, then we need to generate a new fingerprint for every expression
133+
in groups 1, 2, 3, 4, and 5 with group 1 "rewritten" as group 6.
134+
135+
More formally, we are maintaining this invariant:
136+
**For every expression, there exists a fingerprint that maps back to the expression that uses the**
137+
**root groups of their children to calculate the hash.**
138+
139+
For example, if we have a group set { 1, 3, 5 } with root group 1 and group set { 2, 4, 6 } with
140+
root group 2, the fingerprint of Join(5, 4) should really be a fingerprint of Join(1, 2).
141+
142+
This invariant means that when we are checking if some expression already exists, we should use the
143+
root groups of the child groups in our expression to calculate the fingerprint, and we can guarantee
144+
that no fingerprint matches implies no duplicates.
145+
146+
A further implication of this invariant means that new fingerprints need to be generated every time
147+
we merge groups. If we have a left group set { 1, 3, 5 } with root group 1 and right group set
148+
{ 2, 4, 6 } with root group 2, and we merge the first group set into the second, then every
149+
expression that has a child group of 1, 3, or 5 now has a stale fingerprint that uses root group 1
150+
instead of root group 2.
151+
152+
Thus, when we merge the left group into the right group, we need to do the following:
153+
154+
1. Gather the group set, i.e. every single group that has root group 1 (iterate)
155+
2. Retrieve every single expression that has a child group in the group set (via junction table)
156+
3. Generate a new fingerprint for each expression and add it into the memo table
157+
158+
The speed of steps 2 and 3 above are largely dependent on the backing DBMS. However, we can support
159+
step 1 directly in the union find data structure by maintain a circular linked list for every set.
160+
Each group now tracks both a `parent` pointer and a `next` pointer. When merging / unioning a set
161+
into another set, we swap the `next` pointers of the two roots to maintain the circular linked list.
162+
This allows us to do step 1 in linear time relative to the size of the group set.
163+
164+
### Discovered Duplicates
165+
166+
The above algorithm has one more problem: merging groups can cause the memo table to "discover" that
167+
there are duplicate expressions in the memo table.
168+
169+
Here is an example: let's say we have the following groups, each with one expression (note that the
170+
example will work even with multiple expressions):
171+
172+
1. `Scan(1)`
173+
2. `Scan(2)`
174+
3. `Filter(1)`
175+
4. `Filter(2)`
176+
5. `Filter(4)`
177+
6. `Join(3, 4)`
178+
7. `Join(3, 5)`
179+
8. `Sort(6)`
180+
9. `Sort(7)`
181+
182+
Note how groups 5 is just a second filter on top of group 2. Suppose that we find out that
183+
`(Filter(4) = Filter(Filter(2))) == Filter(2)`. In that case, we need to merge groups 4 and 5. The
184+
problem here is that groups 6 and 7 are considered separate groups, but we have now discovered that
185+
they are actually the same. The same is true for groups 8 and 9. In this scenario, the merging of
186+
groups has "generated" a duplicate expression.
187+
188+
However, this is not as big of a problem as it might seem. The issue we want to avoid is lots of
189+
duplicate work or even an infinite loop of rule application. Observe that if we apply a rule to both
190+
the expression in group 6 and group 7 that we will get the same exact expression.
191+
192+
For example, if we apply join commutativity to the expression in group 6 (`Join(3, 4)`), we would
193+
add `Join(4, 3)` into group 6. When we apply join commutativity to the expression in group 7
194+
(`Join(3, 5)`), we would get back `Join(5, 3)`. However, the memo table will detect this as a
195+
duplicate because it will use the root group of 4 and 5 to generate the fingerprint and see that
196+
`Join(4, 3)` already exists. Again, similar logic applies for groups 8 and 9.
197+
198+
At a high level, almost all of our operations are lazy. Work does not need to be done unless it is
199+
absolutely necessary for correctness. By allowing some amount of duplicates, we get some nice
200+
properties with respect to parallelizing memo table access.
201+
202+
## Efficiency and Parallelism
203+
204+
Fingerprinting by itself is very efficient, as creating a fingerprint and looking up a fingerprint
205+
can be made quite efficient with indexes. The real concern here is that merging two groups is very,
206+
very expensive. Depending on the workload, it is both possible that the amortized cost is low or
207+
that group merging takes a majority of the work.
208+
209+
However, we must remember that we want to parallelize access to the memo table. The above algorithms
210+
are notably **read and append only**. There is never a point where we need to update an expression
211+
to maintain invariants. This is important, as it means that we can add and lookup expression and
212+
groups _without having to take any locks_. If we enforce a serializable isolation level, then every
213+
method on the memo table can be done in parallel with relatively low contention due to there being
214+
zero write-write conflicts.

optd-mvp/entities.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This assumes that you already have the `sqlite3` binary installed. First, make s
88
$ cargo install sea-orm-cli
99
```
1010

11-
Make sure your working directory is in the crate root:
11+
Make sure your working directory is in the crate root (not workspace):
1212

1313
```sh
1414
$ cd optd-mvp

optd-mvp/src/expression/logical_expression.rs

Lines changed: 56 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,20 @@
22
//!
33
//! FIXME: All fields are placeholders.
44
//!
5-
//! TODO Remove dead code.
65
//! TODO Figure out if each relation should be in a different submodule.
76
//! TODO This entire file is a WIP.
87
9-
#![allow(dead_code)]
10-
118
use crate::{entities::*, memo::GroupId};
129
use fxhash::hash;
1310
use serde::{Deserialize, Serialize};
1411

15-
#[derive(Clone, Debug, PartialEq, Eq)]
12+
#[derive(Clone, Debug)]
1613
pub enum LogicalExpression {
1714
Scan(Scan),
1815
Filter(Filter),
1916
Join(Join),
2017
}
2118

22-
/// FIXME: Figure out how to make everything unsigned instead of signed.
2319
impl LogicalExpression {
2420
pub fn kind(&self) -> i16 {
2521
match self {
@@ -29,11 +25,6 @@ impl LogicalExpression {
2925
}
3026
}
3127

32-
/// Definitions of custom fingerprinting strategies for each kind of logical expression.
33-
pub fn fingerprint(&self) -> i64 {
34-
self.fingerprint_with_rewrite(&[])
35-
}
36-
3728
/// Calculates the fingerprint of a given expression, but replaces all of the children group IDs
3829
/// with a new group ID if it is listed in the input `rewrites` list.
3930
///
@@ -55,41 +46,84 @@ impl LogicalExpression {
5546

5647
let kind = self.kind() as u16 as usize;
5748
let hash = match self {
58-
LogicalExpression::Scan(scan) => hash(scan.table_schema.as_str()),
49+
LogicalExpression::Scan(scan) => hash(scan.table.as_str()),
5950
LogicalExpression::Filter(filter) => {
6051
hash(&rewrite(filter.child).0) ^ hash(filter.expression.as_str())
6152
}
6253
LogicalExpression::Join(join) => {
63-
hash(&rewrite(join.left).0)
64-
^ hash(&rewrite(join.right).0)
54+
// Make sure that there is a difference between `Join(A, B)` and `Join(B, A)`.
55+
hash(&(rewrite(join.left).0 + 1))
56+
^ hash(&(rewrite(join.right).0 + 2))
6557
^ hash(join.expression.as_str())
6658
}
6759
};
6860

6961
// Mask out the bottom 16 bits of `hash` and replace them with `kind`.
7062
((hash & !0xFFFF) | kind) as i64
7163
}
64+
65+
/// Checks equality between two expressions, with both expression rewriting their child group
66+
/// IDs according to the input `rewrites` list.
67+
pub fn eq_with_rewrite(&self, other: &Self, rewrites: &[(GroupId, GroupId)]) -> bool {
68+
// Closure that rewrites a group ID if needed.
69+
let rewrite = |x: GroupId| {
70+
if rewrites.is_empty() {
71+
return x;
72+
}
73+
74+
if let Some(i) = rewrites.iter().position(|(curr, _new)| &x == curr) {
75+
assert_eq!(rewrites[i].0, x);
76+
rewrites[i].1
77+
} else {
78+
x
79+
}
80+
};
81+
82+
match (self, other) {
83+
(LogicalExpression::Scan(scan_left), LogicalExpression::Scan(scan_right)) => {
84+
scan_left.table == scan_right.table
85+
}
86+
(LogicalExpression::Filter(filter_left), LogicalExpression::Filter(filter_right)) => {
87+
rewrite(filter_left.child) == rewrite(filter_right.child)
88+
&& filter_left.expression == filter_right.expression
89+
}
90+
(LogicalExpression::Join(join_left), LogicalExpression::Join(join_right)) => {
91+
rewrite(join_left.left) == rewrite(join_right.left)
92+
&& rewrite(join_left.right) == rewrite(join_right.right)
93+
&& join_left.expression == join_right.expression
94+
}
95+
_ => false,
96+
}
97+
}
98+
99+
pub fn children(&self) -> Vec<GroupId> {
100+
match self {
101+
LogicalExpression::Scan(_) => vec![],
102+
LogicalExpression::Filter(filter) => vec![filter.child],
103+
LogicalExpression::Join(join) => vec![join.left, join.right],
104+
}
105+
}
72106
}
73107

74-
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]
108+
#[derive(Serialize, Deserialize, Clone, Debug)]
75109
pub struct Scan {
76-
table_schema: String,
110+
table: String,
77111
}
78112

79-
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]
113+
#[derive(Serialize, Deserialize, Clone, Debug)]
80114
pub struct Filter {
81115
child: GroupId,
82116
expression: String,
83117
}
84118

85-
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]
119+
#[derive(Serialize, Deserialize, Clone, Debug)]
86120
pub struct Join {
87121
left: GroupId,
88122
right: GroupId,
89123
expression: String,
90124
}
91125

92-
/// TODO Use a macro instead.
126+
/// TODO Use a macro.
93127
impl From<logical_expression::Model> for LogicalExpression {
94128
fn from(value: logical_expression::Model) -> Self {
95129
match value.kind {
@@ -110,7 +144,7 @@ impl From<logical_expression::Model> for LogicalExpression {
110144
}
111145
}
112146

113-
/// TODO Use a macro instead.
147+
/// TODO Use a macro.
114148
impl From<LogicalExpression> for logical_expression::Model {
115149
fn from(value: LogicalExpression) -> logical_expression::Model {
116150
fn create_logical_expression(
@@ -152,7 +186,9 @@ mod build {
152186
use crate::expression::LogicalExpression;
153187

154188
pub fn scan(table_schema: String) -> LogicalExpression {
155-
LogicalExpression::Scan(Scan { table_schema })
189+
LogicalExpression::Scan(Scan {
190+
table: table_schema,
191+
})
156192
}
157193

158194
pub fn filter(child_group: GroupId, expression: String) -> LogicalExpression {

0 commit comments

Comments
 (0)