Skip to content

Conversation

@acking-you
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

No

Some thoughts

In current computing frameworks, all computation results are in the form of immutable shared ownership, which leads to many situations where new buffers have to be created instead of reusing existing ones.

For example, in the case of early exit optimization, it should be possible to reuse and modify lhs to return the result.

However, doing so under the current framework would lead to memory safety issues. For instance, the evaluate implementation of the Column expression reuses the original Arc to obtain the result instead of creating a new one, while in other cases a new structure is created (which can be safely modified).

If there were a way to internally indicate whether the current result is newly created or reuses an Arc, and if we could design a mutable API for this scenario, it might be possible to reduce unnecessary copies in many computation processes across DataFusion.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jul 27, 2025
Copy link

@mwylde mwylde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this so quickly!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @acking-you and @mwylde

I had a few questions but the PR looks nice to me

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @acking-you and @mwylde

I think there are some ways to make this PR even faster (avoid vec! specifically). However, given this PR is now correct where previously it was not, I think we can merge it and handle other things as a follow on

return Ok(right_ret);
}
} else {
let array =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect anywhere we do a BooleanArray::from(vec![..]) could be improved eventually if we care. It is likely not worth pursuing in this PR, but I figured I would bring it u

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 3, 2025
@alamb alamb merged commit eb2b8c0 into apache:main Aug 4, 2025
28 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 4, 2025

Thanks again @acking-you and @mwylde

@alamb
Copy link
Contributor

alamb commented Aug 4, 2025

Gah, somehow this PR caused a CI failure: https://github.com/apache/datafusion/actions/runs/16732639756/job/47364317932

Perhaps it undid the fix by @adamreeve here:

@liamzwbao liamzwbao mentioned this pull request Aug 4, 2025
hknlof pushed a commit to hknlof/datafusion that referenced this pull request Aug 20, 2025
* fix error result in execute&pre_selection

* fix clippy

* Optimize implementation

* more efficiency impl

* fix CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect results from pre_selection_scatter when RHS is scalar

3 participants