Skip to content

Commit ee45576

Browse files
authored
Merge pull request #12 from cmu-db/detailed-entity-docs
Detailed Entity Docs
2 parents d2ab67f + 6df7ecd commit ee45576

9 files changed

+211
-0
lines changed

optd-persistent/src/migrator/memo/m20241029_000001_cascades_group.rs

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,70 @@
1+
//! An entity representing a group / equivalence class in the Cascades framework.
2+
//!
3+
//! Quoted from the Microsoft article _Extensible query optimizers in practice_:
4+
//!
5+
//! > In the memo, each class of equivalent expressions is called an equivalent class or a group,
6+
//! > and all equivalent expressions within the class are called group expressions or simply
7+
//! > expressions.
8+
//!
9+
//! A Cascades group is defined as a class of equivalent logical or physical expressions. The
10+
//! Cascades framework uses these groups as a way of storing the best query sub-plans for use in the
11+
//! dynamic programming search algorithm.
12+
//!
13+
//! For example, a Cascades group could be the set of expressions containing the logical expressions
14+
//! `Join(A, B)` and `Join(B, A)`, as well as the physical expressions `HashJoin(A, B)` and
15+
//! `NestedLoopJoin(B, A)`.
16+
//!
17+
//! # Columns
18+
//!
19+
//! Each group is assigned a monotonically-increasing (unique) ID. This ID will be important since
20+
//! there are many foreign key references from other tables to `cascades_group`.
21+
//!
22+
//! We additionally store a `latest_winner` foreign key reference to a physical expression. See
23+
//! the [section](#best-physical-plan-winner) below for more details.
24+
//!
25+
//! Finally, we store `in_progress` and `is_optimized` flags that are used for quickly determining
26+
//! the state of optimization for this group during the dynamic programming search.
27+
//!
28+
//! # Entity Relationships
29+
//!
30+
//! ### Child Expressions (Logical and Physical)
31+
//!
32+
//! To retrieve all of a `cascades_group`'s equivalent expressions, you must query the
33+
//! [`logical_expression`] or the [`physical_expression`] entities via their foreign keys to
34+
//! `cascades_group`. The relationship between [`logical_expression`] and `cascades_group` is
35+
//! many-to-one, and the exact same many-to-one relationship is held for [`physical_expression`] to
36+
//! `cascades_group`.
37+
//!
38+
//! ### Parent Expressions (Logical and Physical)
39+
//!
40+
//! Additionally, each logical or physical expression can have any number of `cascades_group`s as
41+
//! children, and a group can be a child of any expression. Thus, `cascades_group` additionally has
42+
//! a many-to-many relationship with [`logical_expression`] and [`physical_expression`] via the
43+
//! [`logical_children`] and [`physical_children`] entities.
44+
//!
45+
//! To reiterate, `cascades_group` has **both** a one-to-many **and** a many-to-many relationship
46+
//! with both [`logical_expression`] and [`physical_expression`]. This is due to groups being both
47+
//! parents and children of expressions.
48+
//!
49+
//! ### Best Physical Plan (Winner)
50+
//!
51+
//! The `cascades_group` entity also stores a `latest_winner` _nullable_ foreign key reference to
52+
//! a physical expression. This represents the most recent best query plan we have computed. The
53+
//! reason it is nullable is because we may not have come up with any best query plan yet.
54+
//!
55+
//! ### Logical Properties
56+
//!
57+
//! Lastly, each `cascades_group` record will have a set of logical properties store in the
58+
//! [`logical_property`] entity, where there is an many-to-one relationship from
59+
//! [`logical_property`] to `cascades_group`. Note that we do not store physical properties directly
60+
//! on the `cascades_group`, but rather we store them for each [`physical_expression`] record.
61+
//!
62+
//! [`logical_expression`]: super::logical_expression
63+
//! [`physical_expression`]: super::physical_expression
64+
//! [`logical_children`]: super::logical_children
65+
//! [`physical_children`]: super::physical_children
66+
//! [`logical_property`]: super::logical_property
67+
168
use crate::migrator::memo::physical_expression::PhysicalExpression;
269
use sea_orm_migration::{prelude::*, schema::*};
370

optd-persistent/src/migrator/memo/m20241029_000001_group_winner.rs

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,35 @@
1+
//! An entity representing a the best physical plan (or "winner") of a Cascades group.
2+
//!
3+
//! In the Cascades framework, query optimization is done through dynamic programming that is based
4+
//! on the assumption that the cost model satisfies the _principle of optimality_. Quoted from the
5+
//! Microsoft article _Extensible query optimizers in practice_:
6+
//!
7+
//! > ... in the search space of linear sequence of joins, the optimal plan for a join of n
8+
//! > relations can be found by extending the optimal plan of a sub-expression of n - 1 joins with
9+
//! > an additional join.
10+
//!
11+
//! By storing the best sub-plans / [`physical_expression`]s of smaller Cascades groups, we can
12+
//! build up an optimal query plan.
13+
//!
14+
//! This entity represents the best plan sub-tree for a specific group. However, we store multiple
15+
//! winners over different epochs, as changes to the database may require us to re-evaluate what the
16+
//! optimal sub-plan is.
17+
//!
18+
//! # Columns
19+
//!
20+
//! Other than the primary key, all of the columns in this relation are foreign keys to other
21+
//! tables.
22+
//!
23+
//! A group winner is defined by the [`cascades_group`] it belongs to (`group_id`), the unique ID of
24+
//! the [`physical_expression`] (`physical_expression_id`), the ID of the cost record in the
25+
//! [`plan_cost`] table (`cost_id`), and the monotonically-increasing epoch ID in the [`event`]
26+
//! table (`epoch_id`).
27+
//!
28+
//! [`cascades_group`]: super::cascades_group
29+
//! [`physical_expression`]: super::physical_expression
30+
//! [`plan_cost`]: super::super::cost_model::plan_cost
31+
//! [`event`]: super::super::cost_model::event
32+
133
use crate::migrator::cost_model::{event::Event, plan_cost::PlanCost};
234
use crate::migrator::memo::{
335
cascades_group::CascadesGroup, physical_expression::PhysicalExpression,

optd-persistent/src/migrator/memo/m20241029_000001_logical_children.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
//! An entity representing the [`cascades_group`] children of every [`logical_expression`].
2+
//!
3+
//! Formally, this entity is a junction which allows us to represent a many-to-many relationship
4+
//! between [`logical_expression`] and [`cascades_group`]. Expressions can have any number of child
5+
//! groups, and every group can be a child of many different expressions, hence the many-to-many
6+
//! relationship.
7+
//!
8+
//! See [`cascades_group`] for more details.
9+
//!
10+
//! [`cascades_group`]: super::cascades_group
11+
//! [`logical_expression`]: super::logical_expression
12+
113
use crate::migrator::memo::{cascades_group::CascadesGroup, logical_expression::LogicalExpression};
214
use sea_orm_migration::{prelude::*, schema::*};
315

optd-persistent/src/migrator/memo/m20241029_000001_logical_expression.rs

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,40 @@
1+
//! An entity representing a logical plan expression in the Cascades framework.
2+
//!
3+
//! Quoted from the Microsoft article _Extensible query optimizers in practice_:
4+
//!
5+
//! > A logical expression is defined as a tree of logical operators, and corresponds to a
6+
//! > relational algebraic expression.
7+
//!
8+
//! In the Cascades query optimization framework, the memo table stores equivalence classes of
9+
//! expressions (see [`cascades_group`]). These equivalence classes, or "groups", store both
10+
//! `logical_expression`s and [`physical_expression`]s.
11+
//!
12+
//! Optimization starts by "exploring" equivalent logical expressions within a group. For example,
13+
//! the logical expressions `Join(A, B)` and `Join(B, A)` are contained in the same group. The
14+
//! logical expressions are defined as a `Join` operator with the groups representing a scan of
15+
//! table `A` and a scan of table `B` as its children.
16+
//!
17+
//! # Columns
18+
//!
19+
//! Each `logical_expression` has a unique primary key ID, but it holds little importance other than
20+
//! helping distinguish between two different expressions.
21+
//!
22+
//! The more interesting column is the `fingerprint` column, in which we store a hashed fingerprint
23+
//! value that can be used to efficiently check equality between two potentially equivalent logical
24+
//! expressions (hash-consing). See ???TODO??? for more information on expression fingerprints.
25+
//!
26+
//! Finally, since there are many different types of operators, we store a variant tag and a data
27+
//! column as JSON to represent the semi-structured data fields of logical operators.
28+
//!
29+
//! # Entity Relationships
30+
//!
31+
//! The only relationship that `logical_expression` has is to [`cascades_group`]. It has **both** a
32+
//! one-to-many **and** a many-to-many relationship with [`cascades_group`], and you can see more
33+
//! details about this in the module-level documentation for [`cascades_group`].
34+
//!
35+
//! [`cascades_group`]: super::cascades_group
36+
//! [`physical_expression`]: super::physical_expression
37+
138
use crate::migrator::memo::cascades_group::CascadesGroup;
239
use sea_orm_migration::{prelude::*, schema::*};
340

optd-persistent/src/migrator/memo/m20241029_000001_logical_property.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
//! An entity representing a logical property of a Cascades group.
2+
//!
3+
//! TODO what exactly are we storing in here?
4+
//! TODO why is it linked to only cascades groups and not logical expressions?
5+
16
use crate::migrator::memo::cascades_group::CascadesGroup;
27
use sea_orm_migration::{prelude::*, schema::*};
38

optd-persistent/src/migrator/memo/m20241029_000001_physical_children.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
//! An entity representing the [`cascades_group`] children of every [`physical_expression`].
2+
//!
3+
//! Formally, this entity is a junction which allows us to represent a many-to-many relationship
4+
//! between [`physical_expression`] and [`cascades_group`]. Expressions can have any number of child
5+
//! groups, and every group can be a child of many different expressions, hence the many-to-many
6+
//! relationship.
7+
//!
8+
//! See [`cascades_group`] for more details.
9+
//!
10+
//! [`cascades_group`]: super::cascades_group
11+
//! [`physical_expression`]: super::physical_expression
12+
113
use crate::migrator::memo::{
214
cascades_group::CascadesGroup, physical_expression::PhysicalExpression,
315
};

optd-persistent/src/migrator/memo/m20241029_000001_physical_expression.rs

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,41 @@
1+
//! An entity representing a logical plan expression in the Cascades framework.
2+
//!
3+
//! Quoted from the Microsoft article _Extensible query optimizers in practice_:
4+
//!
5+
//! > A physical expression is a tree of physical operators, which is also referred to as the
6+
//! > _physical plan_ or simply _plan_.
7+
//!
8+
//! In the Cascades query optimization framework, the memo table stores equivalence classes of
9+
//! expressions (see [`cascades_group`]). These equivalence classes, or "groups", store both
10+
//! [`logical_expression`]s and `physical_expression`s.
11+
//!
12+
//! Optimization starts by exploring equivalent logical expressions within a group, and then it
13+
//! proceeds to implement / optimize those logical operators into physical operators. For example,
14+
//! the logical expression `Join(A, B)` could be implemented into a `HashJoin(A, B)` or a
15+
//! `NestedLoopJoin(A, B)`, and both of these new physical expressions would be contained in the
16+
//! same group.
17+
//!
18+
//! # Columns
19+
//!
20+
//! Each `physical_expression` has a unique primary key ID, and other tables will store a foreign
21+
//! key reference to a specific `physical_expression`s.
22+
//!
23+
//! The more interesting column is the `fingerprint` column, in which we store a hashed fingerprint
24+
//! value that can be used to efficiently check equality between two potentially equivalent physical
25+
//! expressions (hash-consing). See ???TODO??? for more information on expression fingerprints.
26+
//!
27+
//! Finally, since there are many different types of operators, we store a variant tag and a data
28+
//! column as JSON to represent the semi-structured data fields of logical operators.
29+
//!
30+
//! # Entity Relationships
31+
//!
32+
//! The only relationship that `physical_expression` has is to [`cascades_group`]. It has **both** a
33+
//! one-to-many **and** a many-to-many relationship with [`cascades_group`], and you can see more
34+
//! details about this in the module-level documentation for [`cascades_group`].
35+
//!
36+
//! [`cascades_group`]: super::cascades_group
37+
//! [`logical_expression`]: super::logical_expression
38+
139
use crate::migrator::memo::cascades_group::CascadesGroup;
240
use sea_orm_migration::{prelude::*, schema::*};
341

optd-persistent/src/migrator/memo/m20241029_000001_physical_property.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
//! An entity representing a physical property of a physical expression in the Cascades framework.
2+
//!
3+
//! TODO what exactly are we storing in here?
4+
//! TODO why is it linked to only physical expressions and not cascades groups?
5+
16
use crate::migrator::memo::physical_expression::PhysicalExpression;
27
use sea_orm_migration::{prelude::*, schema::*};
38

optd-persistent/src/migrator/memo/mod.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
//! Entities related to the memo table used for dynamic programming in the Cascades query
2+
//! optimization framework.
3+
14
pub(crate) mod m20241029_000001_cascades_group;
25
pub(crate) mod m20241029_000001_group_winner;
36
pub(crate) mod m20241029_000001_logical_children;

0 commit comments

Comments
 (0)