Skip to content

Add support for join filter pullups#6272

Merged
mattnibs merged 1 commit intomainfrom
join-pullup
Oct 2, 2025
Merged

Add support for join filter pullups#6272
mattnibs merged 1 commit intomainfrom
join-pullup

Conversation

@mattnibs
Copy link
Collaborator

@mattnibs mattnibs commented Sep 30, 2025

This commit adds functionality to the optimizer to pull up simple join predicates. Currently this will only pullup simple predicate expressions where keys are compared against constant values.

Partially fixes #6074

@mattnibs mattnibs requested a review from a team September 30, 2025 20:54
}
}

func breakupFilter(e dag.Expr) []dag.Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think a name like splitConjunction would make the behavior of this clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like split but Conjunction to me doesn't necessarily imply filter or predicate. Maybe splitPredicate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My beef with splitPredicate is that it doesn't really say anything about how the function parameter will be split. splitConjunction at least suggests that if the parameter is a logical conjunction then it will be split into its operands.

@philrz
Copy link
Contributor

philrz commented Oct 1, 2025

It's a functional 👍 for me! While the sqllogictests contain lots of heavy cross joins it didn't seem like I could easily spot any that had only predicate expressions with constant comparisons, so I modified one. There's probably no need to make this into a ztest in the super repo since we'll have thousands of similar tests that are even more complex once the full feature work is complete, but FWIW with the attached test data data.tgz, this query runs fast and produces the same result as the equivalent in Postgres:

$ super -version
Version: 0471667fa

$ time super -s -c "
SELECT x7,x3,x62,x57,x47,x25,x51,x55,x44,x49,x53,x32
  FROM t47 (FORMAT parquet), t55 (FORMAT parquet), t62 (FORMAT parquet), t25 (FORMAT parquet), t32 (FORMAT parquet), t7 (FORMAT parquet), t3 (FORMAT parquet), t57 (FORMAT parquet), t53 (FORMAT parquet), t51 (FORMAT parquet), t44 (FORMAT parquet), t49 (FORMAT parquet)
WHERE b32=1
  AND b51=1
  AND b7=1
  AND a49=1
  AND b53=1
  AND b25=7
  AND b57=1
  AND a44=1
  AND a62=1
  AND a49=1
  AND a7=1
  AND a25=1
  AND a3=1
  AND a47=1
  AND b55=1;"

{x7:"table t7 row 1",x3:"table t3 row 1",x62:"table t62 row 1",x57:"table t57 row 9",x47:"table t47 row 1",x25:"table t25 row 1",x51:"table t51 row 10",x55:"table t55 row 1",x44:"table t44 row 1",x49:"table t49 row 1",x53:"table t53 row 5",x32:"table t32 row 6"}

real	0m0.094s
user	0m0.107s
sys	0m0.068s

...whereas at current tip of main it effectively runs forever due to the massive cartesian product.

@mattnibs
Copy link
Collaborator Author

mattnibs commented Oct 1, 2025

thanks @philrz good to hear

This commit adds functionality to the optimizer to pull up simple join
predicates. Currently this will only pullup simple predicate expressions
where keys are compared against constant values.
Copy link
Member

@nwt nwt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think pullup is the wrong term here because this pushes filters down toward the sources rather than pulling them upward. But we can sort that out after this is merged.

@mattnibs mattnibs merged commit 2d64111 into main Oct 2, 2025
3 checks passed
@mattnibs mattnibs deleted the join-pullup branch October 2, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL: Large cartesian product causes very long query runtime

3 participants