Commit 2e3d650
Bring IN LIST Dynamic Filtering work (#63)
* Refactor InListExpr to support structs by re-using existing hashing infrastructure (apache#18449)
This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.
A "target state" is tracked in
apache#18393.
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- (This PR): apache#18449 (depends on
apache#18448)
- apache#18451
- Enhance InListExpr to efficiently store homogeneous lists as arrays
and avoid a conversion to Vec<PhysicalExpr>
by adding an internal InListStorage enum with Array and Exprs variants
- Re-use existing hashing and comparison utilities to support Struct
arrays and other complex types
- Add public function `in_list_from_array(expr, list_array, negated)`
for creating InList from arrays
Although the diff looks large most of it is actually tests and docs. I
think the actual code change is a negative LOC change, or at least
negative complexity (eliminates a trait, a macro, matching on data
types).
---------
Co-authored-by: David Hewitt <mail@davidhewitt.dev>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
(cherry picked from commit 486c5d8)
* feat: Add evaluate_to_arrays function (apache#18446)
## Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->
- Closes apache#18330 .
## Rationale for this change
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Reduce code duplication.
## What changes are included in this PR?
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
A util function replacing many calls which are using the same code.
## Are these changes tested?
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code
If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
No logic should change whatsoever, so each area which now uses this code
should have it's own tests and benchmarks unmodified.
## Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Yes, there is now a new pub function.
No other changes to API.
---------
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
(cherry picked from commit 76b4156)
* Refactor state management in `HashJoinExec` and use CASE expressions for more precise filters (apache#18451)
## Background
This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.
A "target state" is tracked in
apache#18393.
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- apache#18449 (depends on
apache#18448)
- (This PR): apache#18451
## Changes in this PR
This PR refactors state management in HashJoinExec to make filter
pushdown more efficient and prepare for pushing down membership tests.
- Refactor internal data structures to clean up state management and
make usage more idiomatic (use `Option` instead of comparing integers,
etc.)
- Uses CASE expressions to evaluate pushed-down filters selectively by
partition Example: `CASE hash_repartition % N WHEN partition_id THEN
condition ELSE false END`
---------
Co-authored-by: Lía Adriana <lia.castaneda@datadoghq.com>
(cherry picked from commit 5b0aa37)
* Push down InList or hash table references from HashJoinExec depending on the size of the build side (apache#18393)
This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.
A "target state" is tracked in
apache#18393 (*this PR*).
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- apache#18449 (depends on
apache#18448)
- apache#18451
As those are merged I will rebase this PR to keep track of the
"remaining work", and we can use this PR to explore big picture ideas or
benchmarks of the final state.
(cherry picked from commit c0e8bb5)
* fmt
* replace HashTableLookupExpr with lit(true) in proto serialization (apache#19300)
*errors* when serializing now, and would break any users using joins +
protobuf.
---------
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Co-authored-by: David Hewitt <mail@davidhewitt.dev>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Emily Matheys <55631053+EmilyMatt@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>1 parent 61aa275 commit 2e3d650
File tree
36 files changed
+4103
-662
lines changed- datafusion
- common/src
- core/tests/physical_optimizer/filter_pushdown
- expr-common/src
- physical-expr-common/src
- physical-expr/src
- expressions
- window
- physical-plan
- src
- aggregates
- joins
- hash_join
- repartition
- sorts
- proto
- src/physical_plan
- tests/cases
- sqllogictest/test_files
- tpch/plans
- docs/source/user-guide
36 files changed
+4103
-662
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
971 | 971 | | |
972 | 972 | | |
973 | 973 | | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
974 | 1004 | | |
975 | 1005 | | |
976 | 1006 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
| 34 | + | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
44 | 132 | | |
45 | 133 | | |
46 | 134 | | |
| |||
478 | 566 | | |
479 | 567 | | |
480 | 568 | | |
481 | | - | |
482 | | - | |
| 569 | + | |
| 570 | + | |
483 | 571 | | |
484 | 572 | | |
485 | 573 | | |
| |||
522 | 610 | | |
523 | 611 | | |
524 | 612 | | |
525 | | - | |
| 613 | + | |
526 | 614 | | |
527 | 615 | | |
528 | 616 | | |
| |||
1000 | 1088 | | |
1001 | 1089 | | |
1002 | 1090 | | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
1003 | 1171 | | |
0 commit comments