refactor: remove HepGraph, and directly modify LogicalPlan to reduce transformation overhead during optimization #298

KKould · 2026-01-10T21:27:30Z

What problem does this PR solve?

In a recent test, a simple primary key query performance test revealed that HepOptimizer incurred significant overhead in the process of converting LogicalPlan to HepGraph and then back to LogicalPlan. Therefore, this PR modifies HepOptimizer directly to LogicalPlan to avoid the intermediate conversion overhead.

Code changes

Has Rust code change
Has CI related scripts change

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Note for reviewer

…transformation overhead during optimization

Copilot

Pull request overview

This PR refactors the HepOptimizer to operate directly on LogicalPlan instead of converting between LogicalPlan and HepGraph representations. This eliminates intermediate conversion overhead during query optimization, improving performance for query planning.

Changes:

Removed HepGraph abstraction and its conversion logic
Modified all normalization rules to work directly with mutable LogicalPlan references
Introduced helper utilities in plan_utils.rs for common plan tree manipulations
Updated pattern matching to work with LogicalPlan instead of graph nodes
Refactored Memo implementation to use NodePath instead of HepNodeId

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
src/optimizer/heuristic/graph.rs	Removed entire HepGraph implementation (419 lines deleted)
src/optimizer/heuristic/optimizer.rs	Refactored to apply rules directly on LogicalPlan without graph conversion
src/optimizer/heuristic/matcher.rs	Changed from HepMatcher to PlanMatcher for pattern matching on plans
src/optimizer/core/rule.rs	Updated NormalizationRule trait to return bool indicating if changes were made
src/optimizer/core/memo.rs	Replaced HepNodeId with NodePath for tracking positions in plan tree
src/optimizer/plan_utils.rs	Added new utility functions for plan tree navigation and manipulation
src/planner/mod.rs	Added schema cache reset methods for invalidation after modifications
src/optimizer/rule/normalization/*.rs	Refactored all normalization rules to work with LogicalPlan instead of HepGraph
Cargo.toml / Cargo.lock	Removed petgraph dependency, bumped version to 0.1.5
examples/*.rs	Updated example structure to support WASM compilation
src/db.rs	Made execute method public

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/optimizer/heuristic/optimizer.rs

Copilot · 2026-01-10T21:33:29Z

src/optimizer/core/memo.rs

    use std::sync::Arc;
    use tempfile::TempDir;

+    // Tips: This test may occasionally encounter errors; you can repeat the test multiple times.


A helpful comment was added explaining that this test may occasionally encounter errors and suggests repeating it multiple times. This is valuable information for developers running the test suite, as it sets expectations about potential flakiness. However, flaky tests are generally undesirable. Consider investigating and fixing the root cause of the intermittent failures rather than just documenting them.

Suggested change

// Tips: This test may occasionally encounter errors; you can repeat the test multiple times.

// Integration test for building a memo structure over a larger dataset.

src/optimizer/plan_utils.rs

src/optimizer/rule/normalization/combine_operators.rs

src/optimizer/rule/normalization/pushdown_limit.rs

src/optimizer/rule/normalization/combine_operators.rs

src/db.rs

Copilot · 2026-01-10T21:33:31Z

src/optimizer/rule/normalization/combine_operators.rs

    predicate: |op| matches!(op, Operator::Filter(_)),
    children: PatternChildrenPredicate::Predicate(vec![Pattern {
-        predicate: |op| matches!(op, Operator::Filter(_)),
+        predicate: |op| matches!(op, Operator::Filter(_)) || is_passthrough_project_operator(op),


The pattern for COMBINE_FILTERS_RULE was modified to accept either a Filter or a passthrough Project as a child. This is a more permissive pattern that allows combining filters even when there's a passthrough projection between them. However, the implementation in the apply method uses a loop that removes passthrough projects until it finds a filter. If the pattern matches but no filter is found after removing passthrough projects (which shouldn't happen if the pattern is correct), the function returns Ok(false). The pattern and implementation should be verified to ensure they're consistent.

Suggested change

predicate: |op| matches!(op, Operator::Filter(_)) || is_passthrough_project_operator(op),

predicate: |op| matches!(op, Operator::Filter(_)),

Copilot · 2026-01-10T21:33:31Z

src/optimizer/rule/normalization/mod.rs

+            if matches!(lhs, ScalarExpression::ColumnRef { .. }) {
+                return lhs_stripped == strip_all_alias(rhs);
+            }
+            false


The is_subset_exprs function was enhanced to handle alias expressions. The new implementation strips aliases before comparison and has special handling for ColumnRef expressions. At line 140, it checks if the left-hand side is a ColumnRef, and if so, it compares lhs_stripped with strip_all_alias(rhs). However, this logic is asymmetric - it only does this special handling when lhs is a ColumnRef. This means that if lhs is "column_name" and rhs is "alias1(alias2(column_name))", it will match, but the reverse wouldn't. Verify this asymmetry is intentional and document the rationale.

Suggested change

if matches!(lhs, ScalarExpression::ColumnRef { .. }) {

return lhs_stripped == strip_all_alias(rhs);

}

false

// When either side is a ColumnRef, allow matching it against the other

// side with all aliases stripped. This handles nested aliasing

// symmetrically, so that `column` vs `alias1(alias2(column))` matches

// regardless of which slice it appears in.

match (lhs, rhs) {

(ScalarExpression::ColumnRef { .. }, _) => {

lhs_stripped == strip_all_alias(rhs)

}

(_, ScalarExpression::ColumnRef { .. }) => {

strip_all_alias(lhs) == rhs_stripped

}

_ => false,

}

refactor: remove HepGraph, and directly modify LogicalPlan to reduce …

d03a9a3

…transformation overhead during optimization

KKould self-assigned this Jan 10, 2026

KKould requested a review from Copilot January 10, 2026 21:27

KKould added the enhancement New feature or request label Jan 10, 2026

Copilot started reviewing on behalf of KKould January 10, 2026 21:28 View session

Copilot AI reviewed Jan 10, 2026

View reviewed changes

KKould merged commit 55f367e into main Jan 10, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: remove HepGraph, and directly modify LogicalPlan to reduce transformation overhead during optimization #298

refactor: remove HepGraph, and directly modify LogicalPlan to reduce transformation overhead during optimization #298

Uh oh!

KKould commented Jan 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// Tips: This test may occasionally encounter errors; you can repeat the test multiple times.
	// Integration test for building a memo structure over a larger dataset.

	predicate: \|op\| matches!(op, Operator::Filter(_)) \|\| is_passthrough_project_operator(op),
	predicate: \|op\| matches!(op, Operator::Filter(_)),

-            if matches!(lhs, ScalarExpression::ColumnRef { .. }) {
-                return lhs_stripped == strip_all_alias(rhs);
-            }
-            false
+            // When either side is a ColumnRef, allow matching it against the other
+            // side with all aliases stripped. This handles nested aliasing
+            // symmetrically, so that `column` vs `alias1(alias2(column))` matches
+            // regardless of which slice it appears in.
+            match (lhs, rhs) {
+                (ScalarExpression::ColumnRef { .. }, _) => {
+                    lhs_stripped == strip_all_alias(rhs)
+                }
+                (_, ScalarExpression::ColumnRef { .. }) => {
+                    strip_all_alias(lhs) == rhs_stripped
+                }
+                _ => false,
+            }

refactor: remove HepGraph, and directly modify LogicalPlan to reduce transformation overhead during optimization #298

refactor: remove HepGraph, and directly modify LogicalPlan to reduce transformation overhead during optimization #298

Uh oh!

Conversation

KKould commented Jan 10, 2026

What problem does this PR solve?

Code changes

Check List

Note for reviewer

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants