Skip to content

Conversation

@KKould
Copy link
Member

@KKould KKould commented Jan 10, 2026

What problem does this PR solve?

In a recent test, a simple primary key query performance test revealed that HepOptimizer incurred significant overhead in the process of converting LogicalPlan to HepGraph and then back to LogicalPlan. Therefore, this PR modifies HepOptimizer directly to LogicalPlan to avoid the intermediate conversion overhead.

Code changes

  • Has Rust code change
  • Has CI related scripts change

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Note for reviewer

@KKould KKould self-assigned this Jan 10, 2026
@KKould KKould requested a review from Copilot January 10, 2026 21:27
@KKould KKould added the enhancement New feature or request label Jan 10, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the HepOptimizer to operate directly on LogicalPlan instead of converting between LogicalPlan and HepGraph representations. This eliminates intermediate conversion overhead during query optimization, improving performance for query planning.

Changes:

  • Removed HepGraph abstraction and its conversion logic
  • Modified all normalization rules to work directly with mutable LogicalPlan references
  • Introduced helper utilities in plan_utils.rs for common plan tree manipulations
  • Updated pattern matching to work with LogicalPlan instead of graph nodes
  • Refactored Memo implementation to use NodePath instead of HepNodeId

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/optimizer/heuristic/graph.rs Removed entire HepGraph implementation (419 lines deleted)
src/optimizer/heuristic/optimizer.rs Refactored to apply rules directly on LogicalPlan without graph conversion
src/optimizer/heuristic/matcher.rs Changed from HepMatcher to PlanMatcher for pattern matching on plans
src/optimizer/core/rule.rs Updated NormalizationRule trait to return bool indicating if changes were made
src/optimizer/core/memo.rs Replaced HepNodeId with NodePath for tracking positions in plan tree
src/optimizer/plan_utils.rs Added new utility functions for plan tree navigation and manipulation
src/planner/mod.rs Added schema cache reset methods for invalidation after modifications
src/optimizer/rule/normalization/*.rs Refactored all normalization rules to work with LogicalPlan instead of HepGraph
Cargo.toml / Cargo.lock Removed petgraph dependency, bumped version to 0.1.5
examples/*.rs Updated example structure to support WASM compilation
src/db.rs Made execute method public

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

use std::sync::Arc;
use tempfile::TempDir;

// Tips: This test may occasionally encounter errors; you can repeat the test multiple times.
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helpful comment was added explaining that this test may occasionally encounter errors and suggests repeating it multiple times. This is valuable information for developers running the test suite, as it sets expectations about potential flakiness. However, flaky tests are generally undesirable. Consider investigating and fixing the root cause of the intermittent failures rather than just documenting them.

Suggested change
// Tips: This test may occasionally encounter errors; you can repeat the test multiple times.
// Integration test for building a memo structure over a larger dataset.

Copilot uses AI. Check for mistakes.
predicate: |op| matches!(op, Operator::Filter(_)),
children: PatternChildrenPredicate::Predicate(vec![Pattern {
predicate: |op| matches!(op, Operator::Filter(_)),
predicate: |op| matches!(op, Operator::Filter(_)) || is_passthrough_project_operator(op),
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern for COMBINE_FILTERS_RULE was modified to accept either a Filter or a passthrough Project as a child. This is a more permissive pattern that allows combining filters even when there's a passthrough projection between them. However, the implementation in the apply method uses a loop that removes passthrough projects until it finds a filter. If the pattern matches but no filter is found after removing passthrough projects (which shouldn't happen if the pattern is correct), the function returns Ok(false). The pattern and implementation should be verified to ensure they're consistent.

Suggested change
predicate: |op| matches!(op, Operator::Filter(_)) || is_passthrough_project_operator(op),
predicate: |op| matches!(op, Operator::Filter(_)),

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +143
if matches!(lhs, ScalarExpression::ColumnRef { .. }) {
return lhs_stripped == strip_all_alias(rhs);
}
false
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_subset_exprs function was enhanced to handle alias expressions. The new implementation strips aliases before comparison and has special handling for ColumnRef expressions. At line 140, it checks if the left-hand side is a ColumnRef, and if so, it compares lhs_stripped with strip_all_alias(rhs). However, this logic is asymmetric - it only does this special handling when lhs is a ColumnRef. This means that if lhs is "column_name" and rhs is "alias1(alias2(column_name))", it will match, but the reverse wouldn't. Verify this asymmetry is intentional and document the rationale.

Suggested change
if matches!(lhs, ScalarExpression::ColumnRef { .. }) {
return lhs_stripped == strip_all_alias(rhs);
}
false
// When either side is a ColumnRef, allow matching it against the other
// side with all aliases stripped. This handles nested aliasing
// symmetrically, so that `column` vs `alias1(alias2(column))` matches
// regardless of which slice it appears in.
match (lhs, rhs) {
(ScalarExpression::ColumnRef { .. }, _) => {
lhs_stripped == strip_all_alias(rhs)
}
(_, ScalarExpression::ColumnRef { .. }) => {
strip_all_alias(lhs) == rhs_stripped
}
_ => false,
}

Copilot uses AI. Check for mistakes.
@KKould KKould merged commit 55f367e into main Jan 10, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants