-
Notifications
You must be signed in to change notification settings - Fork 0
Change execution order and add better sorting (NOT needed for SIGMOD) #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new split-up junction rule alongside improved cost estimation for percentile predicates, refactors filtering limits to account for multiple workers, and updates executors to propagate a num_workers parameter.
- Optimizer: adds
split_up_junctionsflag and rule; extendsCostSorterwith a regression model forpercentile_op. - Executors: injects
num_workersinto threaded and prefiltering executors; centralizes filtering thresholds viaget_filtering_stop_point. - Tests: updated
test_optimizer.pyand test assets to exercise the new junction splitting behavior.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/backend/engine/optimizer.py | Added split_up_junctions flag/rule and percentile regression in CostSorter |
| backend/backend/engine/constants.py | Refactored FILTERING_STOP_POINTS into nested mapping; added get_filtering_stop_point |
| backend/backend/engine/execution/common.py | Switched exceeds_filtering_limit to use get_filtering_stop_point and ids.size |
| backend/backend/engine/execution/threaded_prefiltering_executor.py | Introduced num_workers parameter and threaded prefiltering logic updates |
| backend/backend/engine/execution/threaded_executor.py | Simplified thread‐pool task submission and removed unused _thread_results |
| backend/backend/engine/execution/simple_executor.py | Switched from Transformer to Transformer_NonRecursive |
| backend/backend/engine/execution/prefiltering_executor.py | Added num_workers support; updated to non‐recursive transformer |
| backend/tests/test_optimizer.py | Updated optimizer instantiations to pass split_up_junctions |
| backend/tests/assets/test_cases_optimizer.py | Adjusted nested‐junction test case to reflect split‐up behavior |
Comments suppressed due to low confidence (2)
backend/backend/engine/execution/threaded_prefiltering_executor.py:42
- Adding a required
num_workersparameter in the constructor without a default breaks backward compatibility. Consider providing a default value (e.g., 1) or makingnum_workerskeyword-only to avoid API breakage.
def __init__(self, write_group: int, fainder_mode: FainderMode, num_workers: int,
backend/tests/test_optimizer.py:14
- There aren't any standalone tests that verify the split-up junction behavior in isolation. Consider adding a focused test that uses
split_up_junctions=Trueon a simple multi-term junction to validate correct binary splitting.
optimizer = Optimizer(cost_sorting=True, keyword_merging=False, split_up_junctions=False)
| if comparison in {"gt", "ge"}: | ||
| percentile = 1 - percentile # Invert percentile for gt/ge comparisons | ||
|
|
||
| # Formular for the regression model for le |
Copilot
AI
Jun 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the comment: "Formular" should be "Formula".
| # Formular for the regression model for le | |
| # Formula for the regression model for le |
| if len(tree.children) > 2: # noqa: PLR2004 | ||
| # Split the disjunction into multiple rules |
Copilot
AI
Jun 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split-up logic only runs once and may still leave more than two children, but the visitor won’t revisit those new nodes. Consider looping or recursively applying the split until each junction has at most two terms.
| if len(tree.children) > 2: # noqa: PLR2004 | |
| # Split the disjunction into multiple rules | |
| while any(isinstance(child, Tree) and child.data == "disjunction" and len(child.children) > 2 for child in tree.children): # noqa: PLR2004 |
| optimizer = Optimizer(cost_sorting=True, keyword_merging=True, split_up_junctions=True) | ||
| plan = deepcopy(test_case["input_tree"]) | ||
|
|
||
| assert test_case["all_rules"] == optimizer.optimize(plan) |
Copilot
AI
Jun 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] For consistency with the other tests, you might capture the result of optimizer.optimize(plan) in a variable (e.g., optimized_plan) before asserting, which improves readability.
| assert test_case["all_rules"] == optimizer.optimize(plan) | |
| optimized_plan = optimizer.optimize(plan) | |
| assert test_case["all_rules"] == optimized_plan |
Adds split-up junctions and sorting for percentile predicates. And changes the transformer type.
I will add the regression tests in the coming days.
It is not needed for SIGMOD because the performance gains are only noticeable with very complex queries in exact mode.
However, I wanted to add this PR because it has been ready for a while, waiting for the other PR to be merged.