Arrow2 read parquet file did not reuse the page decoder buffer to array

Let's look at these codes in  
https://github.com/jorgecarleitao/arrow2/blob/main/src/io/parquet/read/deserialize/primitive/basic.rs#L219-L226

```
  State::Required(page) => {
                values.extend(
                    page.values
                        .by_ref()
                        .map(decode)
                        .map(self.op)
                        .take(remaining),
                );
            }
```

It had extra `memcpy` in `values.extend` and decode, I think maybe we could optimize it by using Buffer clone.


The first motivation is to move 
```
#[derive(Debug, Clone)]
pub struct DataPage {
    pub(super) header: DataPageHeader,
    pub(super) buffer: Vec<u8>,
    ...
}
```

to

```
#[derive(Debug, Clone)]
pub struct DataPage {
    pub(super) header: DataPageHeader,
    pub(super) buffer: Buffer<u8>,
    ...
}
```

@jorgecarleitao what do you think about this?

I found arrow-rs had addressed this improvement in https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/array_reader/byte_array.rs#L115-L138

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow2 read parquet file did not reuse the page decoder buffer to array #1324

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Arrow2 read parquet file did not reuse the page decoder buffer to array #1324

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions