-
Notifications
You must be signed in to change notification settings - Fork 1.1k
arrow-row: Add ListView support #9176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
friendlymatthew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, do you mind rebasing? I think CI is failing because this needs to be rebased on the latest master- the base branch is missing the encoded_len fn that was added recently
Otherwise, the implementation makes sense to me
cd2a465 to
29f13ef
Compare
|
Very confused because both this PR and #9175 are based off of latest main, and the tip of the other PR seems to work fine. |
|
Figured it out, I did actually need to change something after the rebase here, also refactored the use in both list-types. |
| list_size += 1; | ||
| } | ||
| } | ||
| O::from_usize(child_count).expect("overflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to force a panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, same as the other assertion, this is consistent with what we do for regular lists as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I feel it makes more sense to return an error here since the function already supports that
| .collect(); | ||
|
|
||
| let child = unsafe { converter.convert_raw(&mut child_rows, validate_utf8) }?; | ||
| assert_eq!(child.len(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return an error here since the function returns a result anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we perform the exact same assertion in the regular lists, so I did this for consistency and it seems like a good idea since something went pretty spectacularly wrong if this isn't true
arrow-row/src/lib.rs
Outdated
|
|
||
| /// Computes the minimum offset and maximum end (offset + size) for a ListView array. | ||
| /// Returns (min_offset, max_end) which can be used to slice the values array. | ||
| fn compute_list_view_bounds<O: OffsetSizeTrait>(array: &GenericListViewArray<O>) -> (usize, usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems oddly placed; should be lower down instead of in the middle of the mod declarations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, will move
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it further down in the file, but not sure I love that either, maybe move it to the list file? what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always put it down near row_lengths where it won't intrude in the middle of Codec here
| _ => unreachable!(), | ||
| }; | ||
|
|
||
| let null_buffer = NullBuffer::new(BooleanBuffer::new(nulls.into(), 0, rows.len())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use new_unchecked as you already have the null count and we want to avoid calculating it twice
| let null_buffer = NullBuffer::new(BooleanBuffer::new(nulls.into(), 0, rows.len())); | |
| let null_buffer = NullBuffer::new_unchecked(BooleanBuffer::new(nulls.into(), 0, rows.len()), null_count); |
|
|
||
| if size > 0 { | ||
| min_offset = min_offset.min(offset); | ||
| max_end = max_end.max(end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can break if you reached maximum bounds (0 and maximum value that can be)
| #[test] | ||
| fn test_list_view() { | ||
| test_single_list_view::<i32>(); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_large_list_view() { | ||
| test_single_list_view::<i64>(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add nested tests like the regular list
| test_nested_list::<i64>(); | ||
| } | ||
|
|
||
| fn test_single_list_view<O: OffsetSizeTrait>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more tests that take advantage of the fact that this is a view, namely
- both list point to the same value.
- unordered offsets (one item is from offset x and some item after that is from offset y and y is before x)
- list 1 items cover list 2 items and a little more (e.g. list 1 offset is 10 and size 5 and list 2 offset is 12 and size 2).
| ListArray::new(field, offsets, values, Some(nulls)) | ||
| } | ||
|
|
||
| fn generate_column(len: usize) -> ArrayRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the list view and large list view to here as well similar to how list and large list are here.
don't forget to increase the random range so it will cover the new values
Which issue does this PR close?
Closes #9174
What changes are included in this PR?
Implementation and tests. It's mostly copied from
List.Are these changes tested?
Yes, see unit tests.
Are there any user-facing changes?
No, purely additive.
@alamb @Jefffrey