-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: support pushdown alias on dynamic filter with ProjectionExec
#19404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ProjectionExecProjectionExec
ProjectionExecProjectionExec
|
@jackkleeman @adriangb hi, I added the projection alias support in #17246, since you have the most context on this, could you please take a look when you have a chance? |
ee4e327 to
4775fc7
Compare
4775fc7 to
0ccefc8
Compare
adriangb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Just needs some tweaks and more tests
| } | ||
|
|
||
| #[test] | ||
| fn test_filter_pushdown_with_unknown_column() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand how an unknown column fits into the picture? How do they get created? Why do we need special handling here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand how an unknown column fits into the picture? How do they get created? Why do we need special handling here?
unknown column seems right when encounter a column thah can't be found in input schema, but maybe a better way to handle this is simply not collect said filter if unknown column is encountered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is what would cause that to end up in a query? Won’t the query fail to execute? It seems like some marker another optimizer rule puts in place and later cleans up. I’m not saying your handling of them is wrong, I’m just trying to understand what’s going on because I’m surprised there is such a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is what would cause that to end up in a query? Won’t the query fail to execute? It seems like some marker another optimizer rule puts in place and later cleans up. I’m not saying your handling of them is wrong, I’m just trying to understand what’s going on because I’m surprised there is such a thing.
you are right previously it's only used by partition in here:
| Arc::new(UnKnownColumn::new(&expr.to_string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not understand why unknown columns are relevant here. Do any tests fail if you don't do special handling of them? It seems to me that any attempt to push an UnknownColumn in a filter through a ProjectionExec means something has already gone seriously wrong and the end result query would fail to execute.
|
Added tests in |
8544e36 to
6c9e95b
Compare
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
…test: filter pushdown projection Signed-off-by: discord9 <[email protected]>
…iter&test: unit test Signed-off-by: discord9 <[email protected]>
…t assertions Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
…output Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
… for clarity test: add test for filter pushdown with swapped aliases test: update dynamic filter projection pushdown test name for consistency Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
2b6a9a5 to
b50b1ab
Compare
| glob = { workspace = true } | ||
| insta = { workspace = true } | ||
| paste = { workspace = true } | ||
| pretty_assertions = "1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to already be used elsewhere (this is not a net new depednecy), so I think it is ok to add
I have some doubt about leave it unchanged, say if in this case: I guess my point is that replace not found column with |
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Which issue does this PR close?
Rationale for this change
For dynamic filter to work properly, table scan must get correct column even if it's passing through alias(by
ProjectionExec) hence need to modify parent filter whengather_filters_for_pushdownWhat changes are included in this PR?
as title, add support for handling simple alias in pushdown filter, which expand aliased column(in pushdown filter) to it's original expressions(or
UnKnownColumnif can't found aliased column in pushdown filter) so alias in projection is supported, also added unit tests.AI Content claim: the core logic is hand written and thoroughly understood, but unit test are largely generated with some human guidance
Are these changes tested?
Unit tests are added, please comment if more tests are needed
Are there any user-facing changes?
Yes, dynamic filter will work properly with alias now, I'm not sure if that count as breaking change though?