-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Parquet] perf: preallocate capacity for ArrayReaderBuilder #9093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
132b247 to
e2b2b8f
Compare
|
from experiment: Pre-allocation overhead may offset the savings from avoiding incremental growth turning this into a draft |
|
Thank you @lyang24 -- I will look at this more carefully shortly |
|
I suspect we will not be able to detect the difference in an end to end test given how small of an overhead the allocation is compared to running the rest of the query |
@lyang24 -- I am not sure that the Maybe we could pass in the batch size to the reader... Edit: one way we could find out would be to put a println to print out the number of rows that was reserved 🤔 |
|
Update: I made a small test program (below) and printed out the capacities diff --git a/parquet/src/arrow/buffer/view_buffer.rs b/parquet/src/arrow/buffer/view_buffer.rs
index 0343047da6..d87b494b46 100644
--- a/parquet/src/arrow/buffer/view_buffer.rs
+++ b/parquet/src/arrow/buffer/view_buffer.rs
@@ -35,6 +35,7 @@ pub struct ViewBuffer {
impl ViewBuffer {
/// Create a new ViewBuffer with capacity for the specified number of views
pub fn with_capacity(capacity: usize) -> Self {
+ println!("Creating ViewBuffer with capacity {}", capacity);
Self {
views: Vec::with_capacity(capacity),
buffers: Vec::new(),Here is what they are: I think those are the number of rows in the dictionary (not the view themselves) I also then printed out the actual capacites diff --git a/parquet/src/arrow/array_reader/byte_view_array.rs b/parquet/src/arrow/array_reader/byte_view_array.rs
index 8e690c574d..3ea6f08a29 100644
--- a/parquet/src/arrow/array_reader/byte_view_array.rs
+++ b/parquet/src/arrow/array_reader/byte_view_array.rs
@@ -259,6 +259,8 @@ impl ByteViewArrayDecoder {
len: usize,
dict: Option<&ViewBuffer>,
) -> Result<usize> {
+ println!("ByteViewArrayDecoder::read called with len {}, current views capacity: {}", len, out.views.capacity());
+
match self {
ByteViewArrayDecoder::Plain(d) => d.read(out, len),
ByteViewArrayDecoder::Dictionary(d) => {You can actually see most of the reads have an empty buffer I tracked it down in a debugger and the default buffer is being created here: Whole Test Progarm
use std::fs::File;
use std::io::{BufReader, Read};
use std::sync::Arc;
use arrow::datatypes::{DataType, Field, FieldRef, Schema};
use parquet::arrow::arrow_reader::{ArrowReaderOptions, ParquetRecordBatchReaderBuilder};
use bytes::Bytes;
use parquet::file::metadata::ParquetMetaDataReader;
fn main() {
let file_name = "/Users/andrewlamb/Downloads/hits/hits_0.parquet";
println!("Opening file: {file_name}", );
let mut file = File::open(file_name).unwrap();
let mut bytes = Vec::new();
file.read_to_end(&mut bytes).unwrap();
let bytes = Bytes::from(bytes);
let schema = string_to_view_types(ParquetRecordBatchReaderBuilder::try_new(bytes.clone()).unwrap().schema());
//println!("Schema: {:?}", schema);
let options = ArrowReaderOptions::new()
.with_schema(schema);
let reader = ParquetRecordBatchReaderBuilder::try_new_with_options(bytes, options).unwrap()
.with_batch_size(8192)
.build().unwrap();
for batch in reader {
let batch = batch.unwrap();
println!("Read batch with {} rows and {} columns", batch.num_rows(), batch.num_columns());
}
println!("Done");
}
// Hack because the clickbench files were written with the wrong logical type for strings
fn string_to_view_types(schema: &Arc<Schema>) -> Arc<Schema> {
let fields: Vec<FieldRef> = schema
.fields()
.iter()
.map(|field| {
let existing_type = field.data_type();
if existing_type == &DataType::Utf8 || existing_type == &DataType::Binary {
Arc::new(Field::new(
field.name(),
DataType::Utf8View,
field.is_nullable(),
))
} else {
Arc::clone(field)
}
})
.collect();
Arc::new(Schema::new(fields))
} |
|
So, TLDR is my analyis is that we aren't properly sizing the allocations. The good news is we can fix this. The bad news is it may be tricky. I will update the original ticket too |
57a2d80 to
224798a
Compare
Thanks for the deep dive, i think passing down a capacity hint from ArrayReaderBuilder is the right way to go. I made a impl attempt - and results looks promising.
with predicate pushdown the sql runs so fast (7ms) the difference become hard to tell |
224798a to
72f35ad
Compare
72f35ad to
a4f04e7
Compare
|
Nice -- checking it out |
|
run benchmark arrow_reader arrow_reader_row_filter arrow_reader_clickbench |
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lyang24 -- this looks very nice. I kicked off some automated benchmarks and hopefully we can see the benefits reflected there (though the benchmarks are sometimes pretty noisy)
If this does work out, I would like to spend a bit more time trying to get this mechanism to work by removing Default from ValuesBuffer to ensure we got all the cases and that any future array readers work the same
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark arrow_reader arrow_reader_row_filter arrow_reader_clickbench |
|
🤖 |
|
Rerunning the benchmarks to see if we can see a consistent pattern |
9cffc3c to
bb9eb2e
Compare
|
🤖: Benchmark completed Details
|
|
🤖 |
parquet/benches/arrow_reader.rs
Outdated
| use half::f16; | ||
| use num_bigint::BigInt; | ||
| use num_traits::FromPrimitive; | ||
| use parquet::arrow::arrow_reader::DEFAULT_BATCH_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I made the bench using 1024 default batch size
bb9eb2e to
f916120
Compare
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
…er.Ensure internal buffers to be pre-allocated. Api change - making batch size required for ArrayReader and buffers.
f916120 to
49b3244
Compare
|
its looks like its doing well with large scan (full table scan) querys some regressions with high selectivity regression with - i am guessing preallocate large blocks messes up cpu cache? the current code does to an extra allocation at the end of scan - is there anyways to avoid that? |
| std::mem::take(&mut self.values) | ||
| // Replace the buffer with a new one that has the same capacity | ||
| // This avoids reallocations on subsequent batches | ||
| std::mem::replace(&mut self.values, V::with_capacity(self.capacity_hint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would create an extra allocation after the last read :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid this allocation by moving it to where we write?
|
run benchmark arrow_reader arrow_reader_row_filter arrow_reader_clickbench |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
Which issue does this PR close?
Rationale for this change
reduce allocation cost mentioned in #9059 from experiment: Pre-allocation overhead may offset the savings from avoiding incremental growth
What changes are included in this PR?
add with_capacity method to ValuesBuffer trait, and remove defaults to enforce the capacity hint is required for ArrayReaderBuilder.
The capacity hint will be passed down to GenericRecordReader to preallocate the buffer.
Are there any user-facing changes?
yes ArrayReaders needs an extra capacity variable to indicate the preferred batch size and we will provision buffer with this capacity.