-
Notifications
You must be signed in to change notification settings - Fork 54
refactor: remove HepGraph, and directly modify LogicalPlan to reduce transformation overhead during optimization #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…transformation overhead during optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the HepOptimizer to operate directly on LogicalPlan instead of converting between LogicalPlan and HepGraph representations. This eliminates intermediate conversion overhead during query optimization, improving performance for query planning.
Changes:
- Removed HepGraph abstraction and its conversion logic
- Modified all normalization rules to work directly with mutable LogicalPlan references
- Introduced helper utilities in
plan_utils.rsfor common plan tree manipulations - Updated pattern matching to work with LogicalPlan instead of graph nodes
- Refactored Memo implementation to use NodePath instead of HepNodeId
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/optimizer/heuristic/graph.rs | Removed entire HepGraph implementation (419 lines deleted) |
| src/optimizer/heuristic/optimizer.rs | Refactored to apply rules directly on LogicalPlan without graph conversion |
| src/optimizer/heuristic/matcher.rs | Changed from HepMatcher to PlanMatcher for pattern matching on plans |
| src/optimizer/core/rule.rs | Updated NormalizationRule trait to return bool indicating if changes were made |
| src/optimizer/core/memo.rs | Replaced HepNodeId with NodePath for tracking positions in plan tree |
| src/optimizer/plan_utils.rs | Added new utility functions for plan tree navigation and manipulation |
| src/planner/mod.rs | Added schema cache reset methods for invalidation after modifications |
| src/optimizer/rule/normalization/*.rs | Refactored all normalization rules to work with LogicalPlan instead of HepGraph |
| Cargo.toml / Cargo.lock | Removed petgraph dependency, bumped version to 0.1.5 |
| examples/*.rs | Updated example structure to support WASM compilation |
| src/db.rs | Made execute method public |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| use std::sync::Arc; | ||
| use tempfile::TempDir; | ||
|
|
||
| // Tips: This test may occasionally encounter errors; you can repeat the test multiple times. |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A helpful comment was added explaining that this test may occasionally encounter errors and suggests repeating it multiple times. This is valuable information for developers running the test suite, as it sets expectations about potential flakiness. However, flaky tests are generally undesirable. Consider investigating and fixing the root cause of the intermittent failures rather than just documenting them.
| // Tips: This test may occasionally encounter errors; you can repeat the test multiple times. | |
| // Integration test for building a memo structure over a larger dataset. |
| predicate: |op| matches!(op, Operator::Filter(_)), | ||
| children: PatternChildrenPredicate::Predicate(vec![Pattern { | ||
| predicate: |op| matches!(op, Operator::Filter(_)), | ||
| predicate: |op| matches!(op, Operator::Filter(_)) || is_passthrough_project_operator(op), |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern for COMBINE_FILTERS_RULE was modified to accept either a Filter or a passthrough Project as a child. This is a more permissive pattern that allows combining filters even when there's a passthrough projection between them. However, the implementation in the apply method uses a loop that removes passthrough projects until it finds a filter. If the pattern matches but no filter is found after removing passthrough projects (which shouldn't happen if the pattern is correct), the function returns Ok(false). The pattern and implementation should be verified to ensure they're consistent.
| predicate: |op| matches!(op, Operator::Filter(_)) || is_passthrough_project_operator(op), | |
| predicate: |op| matches!(op, Operator::Filter(_)), |
| if matches!(lhs, ScalarExpression::ColumnRef { .. }) { | ||
| return lhs_stripped == strip_all_alias(rhs); | ||
| } | ||
| false |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_subset_exprs function was enhanced to handle alias expressions. The new implementation strips aliases before comparison and has special handling for ColumnRef expressions. At line 140, it checks if the left-hand side is a ColumnRef, and if so, it compares lhs_stripped with strip_all_alias(rhs). However, this logic is asymmetric - it only does this special handling when lhs is a ColumnRef. This means that if lhs is "column_name" and rhs is "alias1(alias2(column_name))", it will match, but the reverse wouldn't. Verify this asymmetry is intentional and document the rationale.
| if matches!(lhs, ScalarExpression::ColumnRef { .. }) { | |
| return lhs_stripped == strip_all_alias(rhs); | |
| } | |
| false | |
| // When either side is a ColumnRef, allow matching it against the other | |
| // side with all aliases stripped. This handles nested aliasing | |
| // symmetrically, so that `column` vs `alias1(alias2(column))` matches | |
| // regardless of which slice it appears in. | |
| match (lhs, rhs) { | |
| (ScalarExpression::ColumnRef { .. }, _) => { | |
| lhs_stripped == strip_all_alias(rhs) | |
| } | |
| (_, ScalarExpression::ColumnRef { .. }) => { | |
| strip_all_alias(lhs) == rhs_stripped | |
| } | |
| _ => false, | |
| } |
What problem does this PR solve?
In a recent test, a simple primary key query performance test revealed that HepOptimizer incurred significant overhead in the process of converting LogicalPlan to HepGraph and then back to LogicalPlan. Therefore, this PR modifies HepOptimizer directly to LogicalPlan to avoid the intermediate conversion overhead.
Code changes
Check List
Tests
Side effects
Note for reviewer