-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: optimize grouping and introduced unparsing and substrait support #16161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks @chenkovsky -- can. you find the original PR that added this |
@eejbyfeldt @comphead could you please help me review this PR? |
it's related to #12704 |
I did not look closely at yet since I have not really contributed here in months.
This extra projection is introduced by But the goal of leaving more structure in logical plan after resolving the grouping expr so that optimization and unparsing sounds reasonable to me. |
datafusion/sql/Cargo.toml
Outdated
bigdecimal = { workspace = true } | ||
datafusion-common = { workspace = true, default-features = true } | ||
datafusion-expr = { workspace = true } | ||
datafusion-functions-aggregate = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tried hard to avoid having the sql parser depend on function libraries -- is there any way to avoid changing this dependency?
13151cc
to
3373cd0
Compare
ed53ae1
to
bf3a290
Compare
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
keep it open |
Thanks @chenkovsky @eejbyfeldt it looks like you contributed the initial implementation of Could you help review this PR? |
Hi @alamb and @chenkovsky! I am running into the same issue @lorenarosati brought up a couple months ago (#16590) and have not found a workaround solution yet. Just wanted to check in, what's the status of this PR? |
I am waiting for someone to help review the PR. Can you help here @Slimsammylim ? Perhaps verify that this solution fixes your issue? |
Hi, I ran my code using this branch and unfortunately it did not solve my issue (#16590). |
hi,could you please push your code, then i can see what's your issue. currently i havent touch substrait, some extra codes should be modified to solve issue 16590. |
I added substrait support. @Slimsammylim please try again |
|
||
// To avoid adding datafusion-functions-aggregate dependency, implement a DummyGroupingUDAF here | ||
#[derive(Debug, Clone, PartialEq, Eq, Hash)] | ||
pub struct DummyGroupingUDAF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, I created a dummy udaf. maybe we can move grouping udaf to datafusion-expr module directly. because grouping udaf is a pseudo udaf itself.
Hi! I tried again and but I keep getting the same error message. |
@Slimsammylim I see, I only tested roundtrip, you only consume substrait plan. I think the simplest workaround is moving __grouping_id to the end of schema, then index will be correct. but It's too tricky. also there's another issue caused by __grouping_id #16983. It's better to remove __grouping_id. I will work on it |
@Slimsammylim I created another PR. #17961 , please take a snapshot. it will not touch grouping udaf in logical plan level. so It will support substrait by design. |
Which issue does this PR close?
Rationale for this change
first, it seems that grouping udaf document is not correct.
and for aggregation with grouping,
e.g.
current logical plan is:
the problems are:
there are three projections. for bitwise operation, there's no benifit for extra projection.
it makes grouping level optimization very hard. for example,
we only need to calculate sum(c2) in when grouping(c1) = 1, this is a very useful optimization trick. I'm also using this trick to optimizing sql with multiple count distinct.
unparsing is not supported. Internal("Tried to unproject column referring to internal grouping id") will be thrown.
What changes are included in this PR?
now, the logical plan is:
there are only two projections. and unparsing is supported.
Are these changes tested?
UT
Are there any user-facing changes?
No