Skip to content

Commit 20c1c34

Browse files
scovichalamb
andauthored
Make variant iterators safely infallible (#7704)
# Which issue does this PR close? * Closes #7684 * Closes #7685 * Part of #6736 # Rationale for this change Infallible iteration is _much_ easier to work with, vs. Iterator of Result or Result of Iterator. Iteration and validation are strongly correlated, because the iterator can only be infallible if the constructor previously validated everything the iterator depends on. # What changes are included in this PR? In all three of `VariantMetadata,` `VariantList,` and `VariantObject`: * The header object is cleaned up to _only_ consider actual header state. Other state is moved to the object itself. * Constructors fully validate the object by consuming a fallible iterator * The externally visible iterator does a `map(Result::unwrap)` on the same fallible iterator, relying on the constructor to prove the unwrap is safe. * The externally visible iterator is obtained by calling `iter()` method. In addition: * `VariantObject` methods no longer materialize the whole offset+field array * Removed validation that is covered by the new iterator testing * A bunch of dead code removed, several methods renamed for clarity * `first_byte_from_slice` now returns `u8` instead of `&u8` # Are there any user-facing changes? Visibility and signatures of some methods changed. --------- Co-authored-by: Andrew Lamb <[email protected]>
1 parent 6227419 commit 20c1c34

File tree

4 files changed

+273
-402
lines changed

4 files changed

+273
-402
lines changed

parquet-variant/src/builder.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,11 +113,11 @@ fn make_room_for_header(buffer: &mut Vec<u8>, start_pos: usize, header_size: usi
113113
/// panic!("unexpected variant type")
114114
/// };
115115
/// assert_eq!(
116-
/// variant_object.field("first_name").unwrap(),
116+
/// variant_object.field_by_name("first_name").unwrap(),
117117
/// Some(Variant::ShortString("Jiaying"))
118118
/// );
119119
/// assert_eq!(
120-
/// variant_object.field("last_name").unwrap(),
120+
/// variant_object.field_by_name("last_name").unwrap(),
121121
/// Some(Variant::ShortString("Li"))
122122
/// );
123123
/// ```

parquet-variant/src/utils.rs

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,10 @@ pub(crate) fn map_try_from_slice_error(e: TryFromSliceError) -> ArrowError {
4646
ArrowError::InvalidArgumentError(e.to_string())
4747
}
4848

49-
pub(crate) fn first_byte_from_slice(slice: &[u8]) -> Result<&u8, ArrowError> {
49+
pub(crate) fn first_byte_from_slice(slice: &[u8]) -> Result<u8, ArrowError> {
5050
slice
5151
.first()
52+
.copied()
5253
.ok_or_else(|| ArrowError::InvalidArgumentError("Received empty bytes".to_string()))
5354
}
5455

@@ -58,14 +59,14 @@ pub(crate) fn string_from_slice(slice: &[u8], range: Range<usize>) -> Result<&st
5859
.map_err(|_| ArrowError::InvalidArgumentError("invalid UTF-8 string".to_string()))
5960
}
6061

61-
/// Performs a binary search on a slice using a fallible key extraction function.
62+
/// Performs a binary search over a range using a fallible key extraction function; a failed key
63+
/// extraction immediately terminats the search.
6264
///
63-
/// This is similar to the standard library's `binary_search_by`, but allows the key
64-
/// extraction function to fail. If key extraction fails during the search, that error
65-
/// is propagated immediately.
65+
/// This is similar to the standard library's `binary_search_by`, but generalized to ranges instead
66+
/// of slices.
6667
///
6768
/// # Arguments
68-
/// * `slice` - The slice to search in
69+
/// * `range` - The range to search in
6970
/// * `target` - The target value to search for
7071
/// * `key_extractor` - A function that extracts a comparable key from slice elements.
7172
/// This function can fail and return an error.
@@ -74,28 +75,33 @@ pub(crate) fn string_from_slice(slice: &[u8], range: Range<usize>) -> Result<&st
7475
/// * `Ok(Ok(index))` - Element found at the given index
7576
/// * `Ok(Err(index))` - Element not found, but would be inserted at the given index
7677
/// * `Err(e)` - Key extraction failed with error `e`
77-
pub(crate) fn try_binary_search_by<T, K, E, F>(
78-
slice: &[T],
78+
pub(crate) fn try_binary_search_range_by<K, E, F>(
79+
range: Range<usize>,
7980
target: &K,
8081
mut key_extractor: F,
8182
) -> Result<Result<usize, usize>, E>
8283
where
8384
K: Ord,
84-
F: FnMut(&T) -> Result<K, E>,
85+
F: FnMut(usize) -> Result<K, E>,
8586
{
86-
let mut left = 0;
87-
let mut right = slice.len();
88-
89-
while left < right {
90-
let mid = (left + right) / 2;
91-
let key = key_extractor(&slice[mid])?;
92-
87+
let Range { mut start, mut end } = range;
88+
while start < end {
89+
let mid = start + (end - start) / 2;
90+
let key = key_extractor(mid)?;
9391
match key.cmp(target) {
9492
std::cmp::Ordering::Equal => return Ok(Ok(mid)),
95-
std::cmp::Ordering::Greater => right = mid,
96-
std::cmp::Ordering::Less => left = mid + 1,
93+
std::cmp::Ordering::Greater => end = mid,
94+
std::cmp::Ordering::Less => start = mid + 1,
9795
}
9896
}
9997

100-
Ok(Err(left))
98+
Ok(Err(start))
99+
}
100+
101+
/// Attempts to prove a fallible iterator is actually infallible in practice, by consuming every
102+
/// element and returning the first error (if any).
103+
pub(crate) fn validate_fallible_iterator<T, E>(
104+
mut it: impl Iterator<Item = Result<T, E>>,
105+
) -> Result<(), E> {
106+
it.find(Result::is_err).transpose().map(|_| ())
101107
}

0 commit comments

Comments
 (0)