feature: add support for slicing above i32 where u64 can safely be coerced to f64 losslessly#861
feature: add support for slicing above i32 where u64 can safely be coerced to f64 losslessly#861mobiusklein wants to merge 7 commits intokylebarron:mainfrom
Conversation
…erced to f64 losslessly
src/reader_async.rs
Outdated
| spawn_local(async move { | ||
| let subset_blob = file | ||
| .slice_with_i32_and_i32( | ||
| let subset_blob = if (range.start <= i32::MAX as u64) && (range.end <= i32::MAX as u64) |
There was a problem hiding this comment.
Instead of having two branches for i32 and f64, we can probably first check that the range start and end is representable as an integer in f64, and then only use the f64 slice api
src/reader_async.rs
Outdated
| } else { | ||
| let start = range.start as f64; | ||
| if start as u64 != range.start { | ||
| panic!("Cannot safely convert start index of {range:?}"); |
There was a problem hiding this comment.
Instead of panicking, we should change this method to be fallible and return an Err
|
As noted in rust-lang/rust#152466, you can just define top level constants for max safe integer pub const MAX_EXACT_INTEGER: f64 = (1 << f64::MANTISSA_DIGITS) - 1;
pub const MIN_EXACT_INTEGER: f64 = -MAX_EXACT_INTEGER; |
|
I made the requested changes and piped the failure along to the I also have a separate branch where I'm working on exposing more of the Parquet file indices and statistics to JS. Is that appropriate to add in a separate PR? |
src/reader_async.rs
Outdated
| } | ||
|
|
||
| pub async fn get_bytes(&mut self, range: Range<u64>) -> Vec<u8> { | ||
| pub async fn get_bytes(&mut self, range: Range<u64>) -> io::Result<Vec<u8>> { |
There was a problem hiding this comment.
We already have a result type in this crate
Lines 24 to 25 in d9e9e1f
src/reader_async.rs
Outdated
| panic!("Cannot safely convert start index of {range:?}"); | ||
| } | ||
| file.slice_with_f64_and_f64(start, end).unwrap() | ||
| sender.send(Err(io::Error::new(io::ErrorKind::Unsupported, format!("{range:?} is too large to convert into a Blob slice")))).unwrap(); |
There was a problem hiding this comment.
This can be a PlatformSupportError
Lines 18 to 19 in d9e9e1f
since JS in the web doesn't support a slice larger than a number
src/reader_async.rs
Outdated
| let result = match file.get_bytes(range).await { | ||
| Ok(result) => Ok(Bytes::from(result)), | ||
| Err(e) => Err(e) | ||
| }; |
src/reader_async.rs
Outdated
| let mut file = self.file.clone(); | ||
| spawn_local(async move { | ||
| let result: Bytes = file.get_bytes(range).await.into(); | ||
| let result = file.get_bytes(range).await.map(Bytes::from); |
Perhaps create an issue first to discuss what you want to achieve. Let's keep this PR focused on this fix. |
Right now, if I try to read a Parquet file that is larger than 2**31 bytes,
WrappedFile::get_bytespanics because it cannot coerce theu64byte range toi32. This change relaxes the limit, checking that we can round-trip the indices fromu64->f64->u64without loss of precision and then uses the f64 representation to slice theBlobwith.This might also be doable with a check for the value being less than
Number.MAX_SAFE_INTEGER, but I didn't see a good way to query that from WASM, though I doubt it can change.