Skip to content

Commit ba3446b

Browse files
authored
[Parquet] perf: reuse seeked File clone in ChunkReader::get_read() (#9214)
# Which issue does this PR close? N/A, it's a minor performance fix. # Rationale for this change While reviewing Parquet performance, I observed a duplicate `try_clone()`. I wasn't able to tell why it was required. After benchmarking and running tests, it seems there is no reason for the duplication. `ChunkReader::get_read()` for `File` calls [`try_clone()`](https://doc.rust-lang.org/std/fs/struct.File.html#method.try_clone) twice: once to seek, then again for the `BufReader`, discarding the first clone. This might be wasteful, as each `try_clone()` duplicates the file descriptor via a system call. So, one less dup() syscall per get_read() call. # What changes are included in this PR? Reuse the already-seeked file clone instead of creating a new one. # Are these changes tested? Covered by existing tests. Local benchmarks using [divan](https://github.com/nvzqz/divan) show ~36% improvement for `get_read()` calls on my laptop. # Are there any user-facing changes? No.
1 parent 9c6065c commit ba3446b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

parquet/src/file/reader.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ impl ChunkReader for File {
9393
fn get_read(&self, start: u64) -> Result<Self::T> {
9494
let mut reader = self.try_clone()?;
9595
reader.seek(SeekFrom::Start(start))?;
96-
Ok(BufReader::new(self.try_clone()?))
96+
Ok(BufReader::new(reader))
9797
}
9898

9999
fn get_bytes(&self, start: u64, length: usize) -> Result<Bytes> {

0 commit comments

Comments
 (0)