Commit ba3446b
authored
[Parquet] perf: reuse seeked File clone in ChunkReader::get_read() (#9214)
# Which issue does this PR close?
N/A, it's a minor performance fix.
# Rationale for this change
While reviewing Parquet performance, I observed a duplicate
`try_clone()`. I wasn't able to tell why it was required. After
benchmarking and running tests, it seems there is no reason for the
duplication.
`ChunkReader::get_read()` for `File` calls
[`try_clone()`](https://doc.rust-lang.org/std/fs/struct.File.html#method.try_clone)
twice: once to seek, then again for the `BufReader`, discarding the
first clone. This might be wasteful, as each `try_clone()` duplicates
the file descriptor via a system call. So, one less dup() syscall per
get_read() call.
# What changes are included in this PR?
Reuse the already-seeked file clone instead of creating a new one.
# Are these changes tested?
Covered by existing tests.
Local benchmarks using [divan](https://github.com/nvzqz/divan) show ~36%
improvement for `get_read()` calls on my laptop.
# Are there any user-facing changes?
No.1 parent 9c6065c commit ba3446b
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
96 | | - | |
| 96 | + | |
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| |||
0 commit comments