You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-23564][SQL] infer additional filters from constraints for join's children
## What changes were proposed in this pull request?
The existing query constraints framework has 2 steps:
1. propagate constraints bottom up.
2. use constraints to infer additional filters for better data pruning.
For step 2, it mostly helps with Join, because we can connect the constraints from children to the join condition and infer powerful filters to prune the data of the join sides. e.g., the left side has constraints `a = 1`, the join condition is `left.a = right.a`, then we can infer `right.a = 1` to the right side and prune the right side a lot.
However, the current logic of inferring filters from constraints for Join is pretty weak. It infers the filters from Join's constraints. Some joins like left semi/anti exclude output from right side and the right side constraints will be lost here.
This PR propose to check the left and right constraints individually, expand the constraints with join condition and add filters to children of join directly, instead of adding to the join condition.
This reverts apache#20670 , covers apache#20717 and apache#20816
This is inspired by the original PRs and the tests are all from these PRs. Thanks to the authors mgaido91 maryannxue KaiXinXiaoLei !
## How was this patch tested?
new tests
Author: Wenchen Fan <[email protected]>
Closesapache#21083 from cloud-fan/join.
Copy file name to clipboardExpand all lines: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala
+38-15Lines changed: 38 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -35,11 +35,25 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
0 commit comments