-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix error result in execute&pre_selection #16930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix error result in execute&pre_selection #16930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this so quickly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @acking-you and @mwylde
I had a few questions but the PR looks nice to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @acking-you and @mwylde
I think there are some ways to make this PR even faster (avoid vec! specifically). However, given this PR is now correct where previously it was not, I think we can merge it and handle other things as a follow on
| return Ok(right_ret); | ||
| } | ||
| } else { | ||
| let array = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect anywhere we do a BooleanArray::from(vec![..]) could be improved eventually if we care. It is likely not worth pursuing in this PR, but I figured I would bring it u
|
Thanks again @acking-you and @mwylde |
|
Gah, somehow this PR caused a CI failure: https://github.com/apache/datafusion/actions/runs/16732639756/job/47364317932 Perhaps it undid the fix by @adamreeve here: |
* fix error result in execute&pre_selection * fix clippy * Optimize implementation * more efficiency impl * fix CI
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No
Some thoughts
In current computing frameworks, all computation results are in the form of immutable shared ownership, which leads to many situations where new buffers have to be created instead of reusing existing ones.
For example, in the case of early exit optimization, it should be possible to reuse and modify
lhsto return the result.However, doing so under the current framework would lead to memory safety issues. For instance, the
evaluateimplementation of the Column expression reuses the originalArcto obtain the result instead of creating a new one, while in other cases a new structure is created (which can be safely modified).If there were a way to internally indicate whether the current result is newly created or reuses an
Arc, and if we could design a mutable API for this scenario, it might be possible to reduce unnecessary copies in many computation processes across DataFusion.