-
Notifications
You must be signed in to change notification settings - Fork 1.9k
<DRAFT> IN LIST optims #19390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
<DRAFT> IN LIST optims #19390
Conversation
|
run benchmark in_list |
|
🤖 |
|
🤖: Benchmark completed Details
|
datafusion/physical-expr/src/expressions/in_list/array_filter.rs
Outdated
Show resolved
Hide resolved
datafusion/physical-expr/src/expressions/in_list/array_filter.rs
Outdated
Show resolved
Hide resolved
|
run benchmarks |
|
🤖 |
|
run benchmark tpch tpchds |
|
🤖 Hi @Dandandan, thanks for the request (#19390 (comment)).
Please choose one or more of these with |
|
🤖: Benchmark completed Details
|
|
run benchmark tpch tpcds |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
@Dandandan how do I think once this optim is done, there could be a lot to reuse for broadcast joins... |
For plain (non dynamic) filters, I think based on a treshold (<= 3) it either gets planned as a chain of or expressions or using |
7ba1c85 to
276a37f
Compare
|
run benchmark in_list |
276a37f to
d18b346
Compare
|
🤖 |
|
🤖: Benchmark completed Details
|
2fc00e5 to
3db393a
Compare
|
run benchmark in_list |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark in_list |
|
🤖 |
|
🤖: Benchmark completed Details
|
c446ba3 to
4fd3a34
Compare
|
run benchmark in_list |
|
🤖 |
|
🤖: Benchmark completed Details
|
4fd3a34 to
0f312b1
Compare
|
run benchmark in_list |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark in_list |
|
🤖 |
|
🤖: Benchmark completed Details
|
Moves the StaticFilter trait and its generic hash-based implementation (ArrayStaticFilter) into a dedicated submodule. This is the first step toward modularizing the in_list expression code. ArrayStaticFilter uses Arrow's row comparison with hash-based lookup for O(1) membership tests. It serves as the fallback for types without specialized filter implementations (e.g., structs, lists).
Extract the filter instantiation logic and specialized filter implementations to a dedicated strategy module. This is a pure refactoring with no behavioral changes. Moves: - instantiate_static_filter function - OrderedFloat32/OrderedFloat64 wrappers - primitive_static_filter! macro (Int8..UInt64 filters) - float_static_filter! macro (Float32/Float64 filters)
Adds O(1) bitmap-based set membership for 1-byte and 2-byte integer types. Also introduces result.rs with shared logic for building BooleanArray results with correct SQL null propagation. BitmapFilter stores a bitset where bit N is set if value N is in the set: - U8Config: 256 bits = 32 bytes (fits in cache line) - U16Config: 65536 bits = 8 KB (fits in L1 cache) Lookup is a single bit test: `bits[value / 64] & (1 << (value % 64))`. This outperforms both hash lookup and branchless comparison at all list sizes for these small integer types.
Adds PrimitiveFilter<T> - a generic HashSet-based filter for primitive types. Also adds contains_slice() method for zero-copy buffer access used by type reinterpretation. The strategy module now uses PrimitiveFilter directly for Int32/Int64/UInt32/UInt64 instead of the macro-generated filters, while keeping Float32/Float64 with OrderedFloat wrappers for now.
Extends type reinterpretation to handle floats by treating their bit patterns as unsigned integers. For equality comparison, only the bit pattern matters, so Float64 can be reinterpreted as UInt64. Adds ReinterpretedPrimitive<D> filter that wraps PrimitiveFilter<D> and reinterprets input arrays at query time. The strategy module now routes Float32/Float64 through make_primitive_filter::<UInt32/UInt64>. This eliminates the need for OrderedFloat wrappers and their associated overhead, while maintaining correctness for NaN handling.
Adds BranchlessFilter<T, N> - a const-generic filter that unrolls membership checks into a fixed-size OR-chain comparison. For small lists (≤16 elements), this outperforms hash lookups due to: - No branching (uses bitwise OR to combine comparisons) - Better CPU pipelining - No hash computation overhead Strategy selection thresholds (tuned via benchmarks): - 4-byte types (Int32, Float32): branchless up to 16 elements - 8-byte types (Int64, Float64): branchless up to 16 elements - 16-byte types (Decimal128): branchless up to 4 elements
Adds specialized filters for Utf8View arrays where all strings are short enough (≤12 bytes) to be stored inline in the 16-byte view struct. These can be reinterpreted as i128 for fast equality comparison: - Utf8ViewHashFilter: uses PrimitiveFilter<Decimal128Type> internally - Utf8ViewBranchless<N>: branchless filter for ≤4 short strings - utf8view_all_short_strings(): helper to check if optimization applies The strategy module checks for Utf8View type and verifies all strings are short before using the optimized path, falling back to generic ArrayStaticFilter for longer strings.
HashTable is better suited for storing values with custom hashing and equality, avoiding the overhead of key-value pairs and simplifying the API usage.
97e63a4 to
cc5d6be
Compare
Which issue does this PR close?
Rationale for this change
The current
InListexpression implementation uses a genericArrayStaticFilterthat relies onmake_comparatorfor all types, which adds significant overhead for primitive types. This PR introduces type-specialized filters that exploit the properties of different data types to achieve substantial performance improvements.What changes are included in this PR?
This PR refactors the
InListexpression to use specialized filter strategies based on data type and list size:1. Bitmap Filters for 1-byte and 2-byte types (UInt8, Int8, UInt16, Int16)
2. Branchless OR-chain Filters for small IN lists
3. Utf8View Short String Optimization
4. Two-Stage ByteViewMaskedFilter for mixed-length Utf8View/BinaryView
5. Type Reinterpretation for Zero-Copy Dispatch
Are these changes tested?
Yes, all existing tests pass (37 tests in
in_listmodule). The optimizations are covered by the existing comprehensive test suite which includes:Are there any user-facing changes?
No user-facing API changes. This is a pure performance optimization that maintains identical behavior.