-
Notifications
You must be signed in to change notification settings - Fork 392
feat: list_contains expression
#6095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7f98673 to
5b11491
Compare
Greptile OverviewGreptile SummaryImplemented Key Changes:
Implementation Highlights:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Expression
participant ListContains UDF
participant SeriesListExtension
participant ListArray Kernel
participant Hash & Compare
User->>Expression: col("lists").list_contains(item)
Expression->>ListContains UDF: call(list_series, item)
alt item.len() == 1
ListContains UDF->>ListContains UDF: broadcast item to match list length
end
ListContains UDF->>SeriesListExtension: list_series.list_contains(item)
alt DataType is List
SeriesListExtension->>ListArray Kernel: list().list_contains(item)
else DataType is FixedSizeList
SeriesListExtension->>ListArray Kernel: fixed_size_list().to_list().list_contains(item)
end
alt flat_child is Null type
ListArray Kernel->>ListArray Kernel: return all false (with proper null handling)
else normal case
ListArray Kernel->>Hash & Compare: cast item, compute hashes
Hash & Compare-->>ListArray Kernel: item_hashes, child_hashes
loop for each list
alt list is null or item is null
ListArray Kernel->>ListArray Kernel: push false with null flag
else
loop for each element in list
ListArray Kernel->>Hash & Compare: compare hash first
alt hashes match
Hash & Compare->>Hash & Compare: full value comparison
alt values equal
ListArray Kernel->>ListArray Kernel: found = true, break
end
end
end
ListArray Kernel->>ListArray Kernel: push result
end
end
end
ListArray Kernel-->>SeriesListExtension: BooleanArray result
SeriesListExtension-->>ListContains UDF: Series result
ListContains UDF-->>Expression: Series result
Expression-->>User: Boolean column
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, no comments
5b11491 to
469efc2
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6095 +/- ##
===========================================
- Coverage 72.91% 43.38% -29.54%
===========================================
Files 973 910 -63
Lines 126196 112849 -13347
===========================================
- Hits 92016 48954 -43062
- Misses 34180 63895 +29715
🚀 New features to boost your workflow:
|
469efc2 to
96c5094
Compare
|
I'm not sure why my Python test cases do not cover the Rust code path. Do I need to add native Rust tests? |
|
|
||
| fn list_contains(&self, item: &Series) -> DaftResult<BooleanArray> { | ||
| let list_nulls = self.nulls(); | ||
| let mut result = Vec::with_capacity(self.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use arrow's arrow::array::BooleanBuilder instead of a vec. Using a vec results in a double iteration.
This should instead be something like
let mut builder = arrow::array::BooleanBuilder::new();
for (..) in .. {
if .. {
builder.append_null()
} else {
builder.append_value(value)
}
}
let arr = builder.finish();
let arr = BooleanArray::from_arrow(
field,
Arc::new(arr)
);
Ok(arr)while this is more focused on arrow2 migration, a lot of these patterns are also relevant to adding net new functionality.
Changes Made
Compare hashes of item and child first before doing full value match in arrow format.
Related Issues
Closes #2749