Skip to content

Comments

Bring IN LIST Dynamic Filtering work#62

Closed
LiaCastaneda wants to merge 15 commits intobranch-50from
lia/bring-in-list-dynamic-filter-work
Closed

Bring IN LIST Dynamic Filtering work#62
LiaCastaneda wants to merge 15 commits intobranch-50from
lia/bring-in-list-dynamic-filter-work

Conversation

@LiaCastaneda
Copy link

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

adriangb and others added 10 commits December 17, 2025 12:43
This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.

A "target state" is tracked in
apache#18393.
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- (This PR): apache#18448
- apache#18449 (depends on
apache#18448)
- apache#18451

Change create_hashes and related functions to work with &dyn Array
references instead of requiring ArrayRef (Arc-wrapped arrays). This
avoids unnecessary Arc::clone() calls and enables calls that only have
an &dyn Array to use the hashing utilities.

- Add create_hashes_from_arrays(&[&dyn Array]) function
- Refactor hash_dictionary, hash_list_array, hash_fixed_list_array to
use references instead of cloning
- Extract hash_single_array() helper for common logic

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
(cherry picked from commit a899ca0)
(cherry picked from commit e53debb)
* Remove spurious `Use` in InListExpr display formatted output

* Adapt tpch.slt expected results

* Reduce verbosity of Display for InListExpr output

* Silence clippy warning

(cherry picked from commit d273ffb)
(cherry picked from commit 050a110)
…nfrastructure (apache#18449)

This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.

A "target state" is tracked in
apache#18393.
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- (This PR): apache#18449 (depends on
apache#18448)
- apache#18451

- Enhance InListExpr to efficiently store homogeneous lists as arrays
and avoid a conversion to Vec<PhysicalExpr>
  by adding an internal InListStorage enum with Array and Exprs variants
- Re-use existing hashing and comparison utilities to support Struct
arrays and other complex types
- Add public function `in_list_from_array(expr, list_array, negated)`
for creating InList from arrays

Although the diff looks large most of it is actually tests and docs. I
think the actual code change is a negative LOC change, or at least
negative complexity (eliminates a trait, a macro, matching on data
types).

---------

Co-authored-by: David Hewitt <mail@davidhewitt.dev>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
(cherry picked from commit 486c5d8)
(cherry picked from commit 181e058)
(cherry picked from commit da3d90a)
(cherry picked from commit fb402df)
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18330 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Reduce code duplication.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

A util function replacing many calls which are using the same code.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

No logic should change whatsoever, so each area which now uses this code
should have it's own tests and benchmarks unmodified.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Yes, there is now a new pub function.
No other changes to API.

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
(cherry picked from commit 76b4156)
(cherry picked from commit 0ff5c27)
…for more precise filters (apache#18451)

This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.

A "target state" is tracked in
apache#18393.
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- apache#18449 (depends on
apache#18448)
- (This PR): apache#18451

This PR refactors state management in HashJoinExec to make filter
pushdown more efficient and prepare for pushing down membership tests.

- Refactor internal data structures to clean up state management and
make usage more idiomatic (use `Option` instead of comparing integers,
etc.)
- Uses CASE expressions to evaluate pushed-down filters selectively by
partition Example: `CASE hash_repartition % N WHEN partition_id THEN
condition ELSE false END`

---------

Co-authored-by: Lía Adriana <lia.castaneda@datadoghq.com>
(cherry picked from commit 5b0aa37)
(cherry picked from commit e9d1985)
… on the size of the build side (apache#18393)

This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.

A "target state" is tracked in
apache#18393 (*this PR*).
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- apache#18449 (depends on
apache#18448)
- apache#18451

As those are merged I will rebase this PR to keep track of the
"remaining work", and we can use this PR to explore big picture ideas or
benchmarks of the final state.

(cherry picked from commit c0e8bb5)
(cherry picked from commit 115313c)
…ache#19300)

*errors* when serializing now, and would break any users using joins +
protobuf.

(cherry picked from commit d61f1a7)
(cherry picked from commit e0a1211)
* chore: update dynamic filter formatting to indicate expr is placeholder

* update tests

* update tests

(cherry picked from commit d587b8d)
(cherry picked from commit f5d374b)
(cherry picked from commit af13635)
@LiaCastaneda LiaCastaneda changed the title Lia/bring in list dynamic filter work Bring IN LIST Dynamic Filtering work Dec 17, 2025
@LiaCastaneda LiaCastaneda force-pushed the lia/bring-in-list-dynamic-filter-work branch from fbb14e4 to 0fd1a1e Compare December 17, 2025 15:09
rkrishn7 and others added 3 commits December 18, 2025 11:11
* chore: bump workspace rust version to 1.90.0

* fix clippy errors

* fix clippy errors

* try using dedicate runner temp space

* retrigger

* inspect disk usage

* split build/run

* disable debug info in ci profile

* revert ci changes

(cherry picked from commit bea1b0a)
@LiaCastaneda LiaCastaneda force-pushed the lia/bring-in-list-dynamic-filter-work branch from e0bc6dd to 23d40fc Compare December 18, 2025 10:14
@LiaCastaneda LiaCastaneda deleted the lia/bring-in-list-dynamic-filter-work branch January 30, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants