Change execution order and add better sorting (NOT needed for SIGMOD) #133

leonardgeissler · 2025-06-10T14:52:30Z

Adds split-up junctions and sorting for percentile predicates. And changes the transformer type.

I will add the regression tests in the coming days.
It is not needed for SIGMOD because the performance gains are only noticeable with very complex queries in exact mode.

However, I wanted to add this PR because it has been ready for a while, waiting for the other PR to be merged.

fix add pp result size estimation add num_workers constants change costs change transformer type fix: change stop points fix add pp result size estimation change threading add num_workers constants update constants change costs add optimizer change transformer type

Copilot

Pull Request Overview

This PR introduces a new split-up junction rule alongside improved cost estimation for percentile predicates, refactors filtering limits to account for multiple workers, and updates executors to propagate a num_workers parameter.

Optimizer: adds split_up_junctions flag and rule; extends CostSorter with a regression model for percentile_op.
Executors: injects num_workers into threaded and prefiltering executors; centralizes filtering thresholds via get_filtering_stop_point.
Tests: updated test_optimizer.py and test assets to exercise the new junction splitting behavior.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
backend/backend/engine/optimizer.py	Added `split_up_junctions` flag/rule and percentile regression in `CostSorter`
backend/backend/engine/constants.py	Refactored `FILTERING_STOP_POINTS` into nested mapping; added `get_filtering_stop_point`
backend/backend/engine/execution/common.py	Switched `exceeds_filtering_limit` to use `get_filtering_stop_point` and `ids.size`
backend/backend/engine/execution/threaded_prefiltering_executor.py	Introduced `num_workers` parameter and threaded prefiltering logic updates
backend/backend/engine/execution/threaded_executor.py	Simplified thread‐pool task submission and removed unused `_thread_results`
backend/backend/engine/execution/simple_executor.py	Switched from `Transformer` to `Transformer_NonRecursive`
backend/backend/engine/execution/prefiltering_executor.py	Added `num_workers` support; updated to non‐recursive transformer
backend/tests/test_optimizer.py	Updated optimizer instantiations to pass `split_up_junctions`
backend/tests/assets/test_cases_optimizer.py	Adjusted nested‐junction test case to reflect split‐up behavior

Comments suppressed due to low confidence (2)

backend/backend/engine/execution/threaded_prefiltering_executor.py:42

Adding a required num_workers parameter in the constructor without a default breaks backward compatibility. Consider providing a default value (e.g., 1) or making num_workers keyword-only to avoid API breakage.

def __init__(self, write_group: int, fainder_mode: FainderMode, num_workers: int,

backend/tests/test_optimizer.py:14

There aren't any standalone tests that verify the split-up junction behavior in isolation. Consider adding a focused test that uses split_up_junctions=True on a simple multi-term junction to validate correct binary splitting.

optimizer = Optimizer(cost_sorting=True, keyword_merging=False, split_up_junctions=False)

Copilot · 2025-06-10T14:58:44Z

backend/backend/engine/optimizer.py

+        if comparison in {"gt", "ge"}:
+            percentile = 1 - percentile  # Invert percentile for gt/ge comparisons
+
+        # Formular for the regression model for le


Typo in the comment: "Formular" should be "Formula".

Suggested change

# Formular for the regression model for le

# Formula for the regression model for le

Copilot · 2025-06-10T14:58:44Z

backend/backend/engine/optimizer.py

+        if len(tree.children) > 2:  # noqa: PLR2004
+            # Split the disjunction into multiple rules


The split-up logic only runs once and may still leave more than two children, but the visitor won’t revisit those new nodes. Consider looping or recursively applying the split until each junction has at most two terms.

Suggested change

if len(tree.children) > 2: # noqa: PLR2004

# Split the disjunction into multiple rules

while any(isinstance(child, Tree) and child.data == "disjunction" and len(child.children) > 2 for child in tree.children): # noqa: PLR2004

Copilot · 2025-06-10T14:58:45Z

backend/tests/test_optimizer.py

+    optimizer = Optimizer(cost_sorting=True, keyword_merging=True, split_up_junctions=True)
    plan = deepcopy(test_case["input_tree"])

    assert test_case["all_rules"] == optimizer.optimize(plan)


[nitpick] For consistency with the other tests, you might capture the result of optimizer.optimize(plan) in a variable (e.g., optimized_plan) before asserting, which improves readability.

Suggested change

assert test_case["all_rules"] == optimizer.optimize(plan)

optimized_plan = optimizer.optimize(plan)

assert test_case["all_rules"] == optimized_plan

leonardgeissler added 3 commits June 10, 2025 16:21

squashed

2a9dc69

fix add pp result size estimation add num_workers constants change costs change transformer type fix: change stop points fix add pp result size estimation change threading add num_workers constants update constants change costs add optimizer change transformer type

revert

0fdf618

update costs

7ce7feb

leonardgeissler requested review from Copilot and lbhm June 10, 2025 14:52

Copilot AI reviewed Jun 10, 2025

View reviewed changes

leonardgeissler added 3 commits June 28, 2025 11:15

Merge branch 'main' into change-execution-order

c247356

fix

2812a46

add timeout

abd7800

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change execution order and add better sorting (NOT needed for SIGMOD) #133

Change execution order and add better sorting (NOT needed for SIGMOD) #133

Uh oh!

leonardgeissler commented Jun 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 10, 2025

Uh oh!

Copilot AI Jun 10, 2025

Uh oh!

Copilot AI Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Formular for the regression model for le
	# Formula for the regression model for le

		if len(tree.children) > 2: # noqa: PLR2004
		# Split the disjunction into multiple rules

	if len(tree.children) > 2: # noqa: PLR2004
	# Split the disjunction into multiple rules
	while any(isinstance(child, Tree) and child.data == "disjunction" and len(child.children) > 2 for child in tree.children): # noqa: PLR2004

	assert test_case["all_rules"] == optimizer.optimize(plan)
	optimized_plan = optimizer.optimize(plan)
	assert test_case["all_rules"] == optimized_plan

Change execution order and add better sorting (NOT needed for SIGMOD) #133

Are you sure you want to change the base?

Change execution order and add better sorting (NOT needed for SIGMOD) #133

Uh oh!

Conversation

leonardgeissler commented Jun 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants