Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
9ac45d2
moving error classes/utilities to common module
knassre-bodo Jul 8, 2025
8728487
Changed more errors to be PyDough exceptions
knassre-bodo Jul 8, 2025
675421c
Added error builder class and integrated for term not found errors
knassre-bodo Jul 8, 2025
097b8a0
Minor refactor to term_not_found error usage [RUN CI]
knassre-bodo Jul 8, 2025
18ba964
WIP
knassre-bodo Jul 9, 2025
684b5d7
WIP improvements on projection pullup
knassre-bodo Jul 9, 2025
cc004ec
Fixing filter/join cases
knassre-bodo Jul 9, 2025
de7d4e6
Bugfixes, testing for correctness [RUN CI]
knassre-bodo Jul 9, 2025
e6dea43
Merge branch 'main' into kian/projection_pullup
knassre-bodo Jul 11, 2025
3827276
Finished dealing with JOIN pull-up
knassre-bodo Jul 12, 2025
1fff8ea
Fixed pullup bugs
knassre-bodo Jul 13, 2025
07136d2
Pullup with LIMIT [RUN CI]
knassre-bodo Jul 13, 2025
922d356
Resolving conflicts
knassre-bodo Jul 13, 2025
a678974
Resolving conflicts
knassre-bodo Jul 13, 2025
602f292
Merge branch 'main' into kian/projection_pullup
knassre-bodo Jul 13, 2025
d7ec696
Adding extra round of bubbling
knassre-bodo Jul 13, 2025
d88cdb5
Compressing limit into root
knassre-bodo Jul 13, 2025
665a9dd
Restoring filter modifications
knassre-bodo Jul 13, 2025
d1fe25b
Started aggregation project pullup
knassre-bodo Jul 13, 2025
0892581
Added SUM(1)->COUNT() optimization
knassre-bodo Jul 13, 2025
a69d764
Cleanup merge projects
knassre-bodo Jul 14, 2025
13d9844
Added some adjacent aggregaiton merging
knassre-bodo Jul 14, 2025
c497127
Added min/min, max/max, anything/anything cases
knassre-bodo Jul 14, 2025
f74fc5c
Adding more aggregation simplification and comments
knassre-bodo Jul 15, 2025
d260428
Added more aggregation simplification + tests
knassre-bodo Jul 15, 2025
856e1a9
Adjusting parameters of optimization
knassre-bodo Jul 15, 2025
c4298cb
Pulled out common logic from filter/join/limit and added comments
knassre-bodo Jul 16, 2025
5e9f09d
Added remaining comments
knassre-bodo Jul 16, 2025
c45b4f2
[RUN CI]
knassre-bodo Jul 16, 2025
6a7bc49
Resolving conflicts [RUN CI]
knassre-bodo Jul 16, 2025
416fbad
Added PageRank tests and fixed bugs found along the way
knassre-bodo Jul 16, 2025
b121c31
Started adding comments
knassre-bodo Jul 16, 2025
05f7147
Started adding comments
knassre-bodo Jul 16, 2025
2697b10
Added comments
knassre-bodo Jul 16, 2025
94726ff
Fixing c4 test and refactoring the PageRank impl to be simpler & more…
knassre-bodo Jul 16, 2025
e21cf57
Merge branch 'main' into kian/projection_pullup
knassre-bodo Jul 16, 2025
44fdd33
[RUN CI]
knassre-bodo Jul 16, 2025
a53500d
Merge branch 'main' into kian/pagerank
knassre-bodo Jul 16, 2025
b5d90f2
[RUN CI]
knassre-bodo Jul 16, 2025
0128758
Added tests e/f, deleted relational/sql tests for graphs other than a/c
knassre-bodo Jul 17, 2025
c9a5fe1
Adjusted how the skips are handled
knassre-bodo Jul 17, 2025
d6ed6c6
Adding larger graph & dense graph tests [RUN CI]
knassre-bodo Jul 17, 2025
ebf5339
Changing test h to be a higher number of iterations [RUN CI]
knassre-bodo Jul 17, 2025
19d8fe5
Merge branch 'main' into kian/error_handler
knassre-bodo Jul 17, 2025
80a9ca0
Moved around more errors, got rid of redundant ones, and fixed a cros…
knassre-bodo Jul 18, 2025
e812143
Overhaul test_qualify_error
knassre-bodo Jul 18, 2025
993553f
Moving more errors
knassre-bodo Jul 18, 2025
85e7c8f
Minor adjustment to how pagerank was written
knassre-bodo Jul 18, 2025
0508822
Initial revisions
knassre-bodo Jul 18, 2025
f59f457
Added set up for simplification
knassre-bodo Jul 18, 2025
b34f6ca
Added first simplification rules
knassre-bodo Jul 19, 2025
7884341
Improved null handling for aggregations
knassre-bodo Jul 19, 2025
b281c8a
Added more simplification rules
knassre-bodo Jul 19, 2025
78f6e78
More >0 filter improvements
knassre-bodo Jul 19, 2025
e8a54b8
Added IFF and KEEP_IF rules
knassre-bodo Jul 19, 2025
40805f7
Resolving conflicts with base branch [RUN CI]
knassre-bodo Jul 19, 2025
6121806
resolving conflicts with base branch
knassre-bodo Jul 19, 2025
b8a7f77
Added more simplification patterns and tests
knassre-bodo Jul 21, 2025
bc5f383
Minor refactoring
knassre-bodo Jul 21, 2025
a828aa9
Fixing double-TPCH error handling
knassre-bodo Jul 21, 2025
2914f9b
overhauling some of the function call creation and error handling
knassre-bodo Jul 21, 2025
9f0961d
Moved function mismatch errors to use min edit distance
knassre-bodo Jul 21, 2025
2e30300
Adjusting tuning of min edit distance errors
knassre-bodo Jul 21, 2025
75c3b7c
Messing with function handling, VARIANCE name, error tuning
knassre-bodo Jul 21, 2025
b94e76d
WIP
knassre-bodo Jul 21, 2025
58cb511
Conflict WIP
knassre-bodo Jul 21, 2025
7347536
Merge branch 'kian/simplify' into kian/error_handler
knassre-bodo Jul 21, 2025
19a0fb8
Resolving conflicts and fixing UDF tests
knassre-bodo Jul 21, 2025
bf69fe8
Moved window errors
knassre-bodo Jul 21, 2025
c5fdfef
Updating helper [RUN CI]
knassre-bodo Jul 21, 2025
716782f
Merge branch 'kian/projection_pullup' into kian/pagerank
knassre-bodo Jul 21, 2025
60be207
[RUN CI]
knassre-bodo Jul 21, 2025
689a4b0
[RUN CI]
knassre-bodo Jul 21, 2025
7dced8e
Merge branch 'kian/simplify' into kian/error_handler
knassre-bodo Jul 21, 2025
ade8f35
Adding more simplification patterns and tests
knassre-bodo Jul 21, 2025
3d9167e
[RUN CI]
knassre-bodo Jul 21, 2025
3d17cad
Update pydough/conversion/projection_pullup.py
knassre-bodo Jul 22, 2025
df2e401
Final revisions/documentation [RUN CI]
knassre-bodo Jul 22, 2025
e1ae265
Resolving conflicts
knassre-bodo Jul 22, 2025
6f27a23
Resolving conflicts [RUN CI]
knassre-bodo Jul 22, 2025
2bbf925
Resolving conflicts before parent merged into main
knassre-bodo Jul 22, 2025
1005fde
Resolving conflicts [RUN cI]
knassre-bodo Jul 22, 2025
d8b2fa8
Merge branch 'kian/simplify' into kian/error_handler
knassre-bodo Jul 22, 2025
ce7e035
Completed refactor of how simplification predicates work to use a Pre…
knassre-bodo Jul 23, 2025
e6c9fbe
Refactoring to use shuttles & visitors for simplification
knassre-bodo Jul 23, 2025
94375ce
Fixing comments
knassre-bodo Jul 23, 2025
4914784
[RUN CI]
knassre-bodo Jul 23, 2025
8f0fbd3
Merge branch 'main' into kian/simplify
knassre-bodo Jul 24, 2025
f150cd5
Adding docstrings
knassre-bodo Jul 24, 2025
8d0fc6b
Revisions
knassre-bodo Jul 24, 2025
22a94ab
Stack cleanup
knassre-bodo Jul 24, 2025
a971676
Adding additional shuttle framework
knassre-bodo Jul 24, 2025
9c6caa2
[RUN CI]
knassre-bodo Jul 24, 2025
ffb58b3
Merge branch 'kian/simplify' into kian/error_handler
knassre-bodo Jul 24, 2025
95e59d1
[RUN CI]
knassre-bodo Jul 24, 2025
604a7a6
Merge branch 'main' into kian/simplify
knassre-bodo Jul 25, 2025
b63c5d4
Added more simplfication patterns to tests
knassre-bodo Jul 31, 2025
02c24bd
Revisions
knassre-bodo Jul 31, 2025
2c65773
Apply suggestions from code review
knassre-bodo Jul 31, 2025
cc12363
edit
knassre-bodo Jul 31, 2025
5f19dcd
Merge remote-tracking branch 'origin/kian/simplify' into kian/simplify
knassre-bodo Jul 31, 2025
6ec13f1
[RUN CI]
knassre-bodo Aug 1, 2025
9344c9a
Fixing SQL test [RUN CI]
knassre-bodo Aug 1, 2025
0e7f48e
Resolving conflicts
knassre-bodo Aug 1, 2025
75ee111
Merge branch 'main' into kian/error_handler
knassre-bodo Aug 1, 2025
59850bd
[RUN CI]
knassre-bodo Aug 1, 2025
5ab03d7
Merge branch 'main' into kian/error_handler
knassre-bodo Aug 1, 2025
f460cda
Revision
knassre-bodo Aug 1, 2025
315993d
Merge branch 'main' into kian/error_handler
knassre-bodo Aug 7, 2025
0fa39c7
Removing dead comment
knassre-bodo Aug 7, 2025
4247700
Resolving conflicts
knassre-bodo Aug 13, 2025
b27b96f
Resolving merge conflicts [RUN CI]
knassre-bodo Aug 14, 2025
4d899b8
Adding more comments/docstrings
knassre-bodo Aug 18, 2025
e8fd112
Initial implementation buggy WIP
knassre-bodo Aug 18, 2025
7587597
resolving conflicts [RUN CI]
knassre-bodo Aug 20, 2025
484decf
Merge branch 'kian/error_handler' into kian/join_aggregate_transpose
knassre-bodo Aug 20, 2025
7bf3268
WIP fixing column handling triple_partition + other bugs
knassre-bodo Aug 21, 2025
cb96cdb
Resolving conflicts [RUN CI]
knassre-bodo Aug 21, 2025
63682a4
WIP
knassre-bodo Aug 21, 2025
a34ec87
Adding reverse cardinality support
knassre-bodo Aug 25, 2025
65697e8
Added reverse cardinality based column pruning [RUN CI] [RUN MYSQL]
knassre-bodo Aug 25, 2025
8f7fbbe
Fixing bug [RUN CI]
knassre-bodo Aug 25, 2025
d797c43
Adjusting aggregation splitting to account for reverse cardinality [R…
knassre-bodo Aug 26, 2025
3041ac9
Stop printing cardinalities in plan files for semi/anti joins
knassre-bodo Aug 26, 2025
efa1335
Revisions and documentation [RUN CI] [RUN MYSQL]
knassre-bodo Aug 26, 2025
48e5d3d
Resolving conflicts
knassre-bodo Sep 8, 2025
b4e3318
Minor revisions
knassre-bodo Sep 8, 2025
091a353
Adjusting edge case for correlation extraction affecting cardinality
knassre-bodo Sep 10, 2025
29dd27d
Resolving conficts
knassre-bodo Sep 17, 2025
6d4e268
Resolving conflicts
knassre-bodo Sep 17, 2025
d903c9d
temporary reversion as setup is being adjusted
knassre-bodo Sep 17, 2025
f130bc3
WIP
knassre-bodo Sep 18, 2025
57bb56e
Resolving conflicts
knassre-bodo Oct 6, 2025
401c1bc
Resolving conflicts
knassre-bodo Oct 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions pydough/conversion/join_agg_transpose.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
""" """

__all__ = ["pull_joins_after_aggregates"]


from collections.abc import Iterable

import pydough.pydough_operators as pydop
from pydough.relational import (
Aggregate,
CallExpression,
ColumnReference,
ColumnReferenceFinder,
Join,
JoinCardinality,
JoinType,
Project,
RelationalExpression,
RelationalNode,
RelationalRoot,
RelationalShuttle,
)


class JoinAggregateTransposeShuttle(RelationalShuttle):
"""
TODO
"""

def __init__(self):
self.finder: ColumnReferenceFinder = ColumnReferenceFinder()

def reset(self):
self.finder.reset()

def visit_join(self, node: Join) -> RelationalNode:
result: RelationalNode | None = None

# Attempt the transpose where the left input is an Aggregate. If it
# succeeded, use that as the result and recursively transform its
# inputs.
if isinstance(node.inputs[0], Aggregate):
result = self.join_aggregate_transpose(node, node.inputs[0], True)
if result is not None:
return self.generic_visit_inputs(result)

# If the attempt failed, then attempt the transpose where the right
# input is an Aggregate. If this attempt succeeded, use that as the
# result and recursively transform its inputs.
if isinstance(node.inputs[1], Aggregate):
result = self.join_aggregate_transpose(node, node.inputs[1], False)
if result is not None:
return self.generic_visit_inputs(result)

# If this attempt failed, fall back to the regular implementation.
return super().visit_join(node)

def generate_name(self, base: str, used_names: Iterable[str]) -> str:
"""
Generates a new name for a column based on the base name and the existing
columns in the join. This is used to ensure that the new column names are
unique and do not conflict with existing names.
"""
if base not in used_names:
return base
i = 0
while True:
name = f"{base}_{i}"
if name not in used_names:
return name
i += 1

def join_aggregate_transpose(
self, join: Join, aggregate: Aggregate, is_left: bool
) -> RelationalNode | None:
"""
Transposes a Join above an Aggregate into an Aggregate above a Join,
when possible and it would be better for performance to use the join
first to filter some of the rows before aggregating.

Args:
`join`: the Join node above the Aggregate.
`aggregate`: the Aggregate node that is the left input to the Join.
`is_left`: whether the Aggregate is the left input to the Join
(True) or the right input (False).

Returns:
The new RelationalNode tree with the Join and Aggregate transposed,
or None if the transpose is not possible.
"""
# Verify that the join is an inner, left, or semi-join, and that the
# join cardinality is singular (unless the aggregations are not affected
# by a change in cardinality).
aggs_allow_plural: bool = all(
call.op in (pydop.MIN, pydop.MAX, pydop.ANYTHING, pydop.NDISTINCT)
for call in aggregate.aggregations.values()
)

# The cardinality with regards to the input being considered must be
# singular (unless the aggregations allow plural), and must be
# filtering (since the point of joining before aggregation is to reduce
# the number of rows to aggregate).
cardinality: JoinCardinality = (
join.cardinality if is_left else join.reverse_cardinality
)

# Verify the cardinality meets the specified criteria, and that the join
# type is INNER/SEMI (since LEFT would not be filtering), where SEMI is
# only allowed if the aggregation is on the left.
if not (
(
(join.join_type == JoinType.INNER)
or (join.join_type == JoinType.SEMI and is_left)
)
and cardinality.filters
and (cardinality.singular or aggs_allow_plural)
):
return None

# The alias of the input to the join that corresponds to the
# aggregate.
desired_alias: str | None = (
join.default_input_aliases[0] if is_left else join.default_input_aliases[1]
)

# Find all of the columns used in the join condition that come from the
# aggregate side of the join
self.finder.reset()
join.condition.accept(self.finder)
agg_condition_columns: set[ColumnReference] = {
col
for col in self.finder.get_column_references()
if col.input_name == desired_alias
}

# Verify ALL of the condition columns from that side of the join are
# in the aggregate keys.
if len(agg_condition_columns) == 0 or any(
col.name not in aggregate.keys for col in agg_condition_columns
):
return None

# A mapping that will be used to map every expression with regards to
# the original join looking at its input expressions to what the
# expression will be in the output columns of the new aggregate

new_join_columns: dict[str, RelationalExpression] = {}
new_aggregate_aggs: dict[str, CallExpression] = {}
new_aggregate_keys: dict[str, RelationalExpression] = {}

new_condition: RelationalExpression = join.condition
agg_input: RelationalNode = aggregate.inputs[0]
non_agg_input: RelationalNode = join.inputs[1] if is_left else join.inputs[0]
new_join_inputs: list[RelationalNode] = (
[agg_input, non_agg_input] if is_left else [non_agg_input, agg_input]
)

project_columns: dict[str, RelationalExpression] = {}

# TODO: FINISH THIS
return None

assert False

new_join: Join = Join(
new_join_inputs,
new_condition,
join.join_type,
new_join_columns,
join.cardinality,
join.reverse_cardinality,
join.correl_name,
)

new_aggregate: Aggregate = Aggregate(
new_join, new_aggregate_keys, new_aggregate_aggs
)

return Project(new_aggregate, project_columns)


def pull_joins_after_aggregates(node: RelationalRoot) -> RelationalNode:
"""
TODO
"""
shuttle: JoinAggregateTransposeShuttle = JoinAggregateTransposeShuttle()
return node.accept_shuttle(shuttle)
5 changes: 4 additions & 1 deletion pydough/conversion/relational_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@
)
from .hybrid_translator import HybridTranslator
from .hybrid_tree import HybridTree
from .join_agg_transpose import pull_joins_after_aggregates
from .merge_projects import merge_projects
from .projection_pullup import pullup_projections
from .relational_simplification import simplify_expressions
Expand Down Expand Up @@ -1588,7 +1589,8 @@ def optimize_relational_tree(
# A: projection pullup
# B: expression simplification
# C: filter pushdown
# D: column pruning
# D: join-aggregate transpose
# E: column pruning
# This is done because pullup will create more opportunities for expression
# simplification, which will allow more filters to be pushed further down,
# and the combination of those together will create more opportunities for
Expand All @@ -1598,6 +1600,7 @@ def optimize_relational_tree(
root = confirm_root(pullup_projections(root))
simplify_expressions(root, configs, additional_shuttles)
root = confirm_root(push_filters(root, configs))
root = confirm_root(pull_joins_after_aggregates(root))
root = pruner.prune_unused_columns(root)

# Re-run projection merging, without pushing into joins. This will allow
Expand Down
5 changes: 0 additions & 5 deletions tests/test_plan_refsols/cryptbank_agg_03.txt

This file was deleted.

6 changes: 0 additions & 6 deletions tests/test_plan_refsols/cryptbank_agg_05.txt

This file was deleted.

13 changes: 0 additions & 13 deletions tests/test_plan_refsols/cryptbank_analysis_01.txt

This file was deleted.

13 changes: 0 additions & 13 deletions tests/test_plan_refsols/cryptbank_analysis_02.txt

This file was deleted.

24 changes: 0 additions & 24 deletions tests/test_plan_refsols/cryptbank_analysis_03.txt

This file was deleted.

9 changes: 0 additions & 9 deletions tests/test_plan_refsols/cryptbank_analysis_04.txt

This file was deleted.

8 changes: 0 additions & 8 deletions tests/test_plan_refsols/cryptbank_filter_count_11.txt

This file was deleted.

5 changes: 0 additions & 5 deletions tests/test_plan_refsols/cryptbank_filter_count_12.txt

This file was deleted.

5 changes: 0 additions & 5 deletions tests/test_plan_refsols/cryptbank_filter_count_13.txt

This file was deleted.

Loading