Avoid clones in `make_array` for `StructArray` and `GenericByteViewArray` #9114

alamb · 2026-01-07T23:20:39Z

Which issue does this PR close?

Part of Reduce overhead to create an Array from ArrayData (make_array) #9061
broken out of Avoid clones while creating Arrays from ArrayData (speed up reading from Parquet reader) #9058

Rationale for this change

The current implementation of make_array for StructArray and GenericByteViewArray clones ArrayData which allocates a new Vec. This is unnecessary given that make_array is passed an owned ArrayData

What changes are included in this PR?

Add a new API to ArrayData to break it down into parts (into_parts)
Use that API to avoid cloning while constructing StructArray and GenericByteViewArray

Are these changes tested?

Yes by CI

Are there any user-facing changes?

A few fewer allocations when creating arrays

…ray`

alamb · 2026-01-07T23:21:17Z

arrow-array/src/array/byte_view_array.rs

-    fn from(value: ArrayData) -> Self {
-        let views = value.buffers()[0].clone();
-        let views = ScalarBuffer::new(views, value.offset(), value.len());
-        let buffers = value.buffers()[1..].to_vec();


this call to_vec() allocates a new Vec which is unecessary

alamb · 2026-01-07T23:22:01Z

arrow-array/src/array/byte_view_array.rs


 impl<T: ByteViewType + ?Sized> From<ArrayData> for GenericByteViewArray<T> {
-    fn from(value: ArrayData) -> Self {
-        let views = value.buffers()[0].clone();


cloneing the buffers is relatively cheap (they are Arcd internally) so avoiding this just makes the code easier to follow, I don't think it will be any significant performance savings

alamb · 2026-01-07T23:22:50Z

arrow-array/src/array/struct_array.rs

                    make_array(cd.slice(parent_offset, parent_len))
                } else {
-                    make_array(cd.clone())
+                    make_array(cd)


cd.clone() clones the ArrayData (and allocates a new Vec) for each child, recursively, which is unecessary

alamb · 2026-01-07T23:23:28Z

arrow-array/src/array/struct_array.rs

+            .into_iter()
            .map(|cd| {
                if parent_offset != 0 || parent_len != cd.len() {
                    make_array(cd.slice(parent_offset, parent_len))


we can probably avoid an additional allocation for sliced arrays by making a version of slice() that consumes self -- like sliced() perhaps 🤔

I filed a ticket to track this idea

Add API to avoid allocations when using ArrayData::slice() on owned ArrayData #9140

alamb · 2026-01-09T15:59:59Z

Thanks @mhilton

…ones

arrow-array/src/array/byte_view_array.rs

scovich

LGTM, one nit

arrow-array/src/array/byte_view_array.rs

Co-authored-by: Jeffrey Vo <[email protected]>

alamb · 2026-01-11T13:41:11Z

BTW I plan to make the same corresponding change for other array types (see #9058) but I was making multiple PRs to reduce the review load

…ones

scovich · 2026-01-12T16:40:03Z

@alamb -- post-merge reply to a resolved comment here (in case github didn't make it easy to find):
#9114 (comment)

@scovich

…om `ArrayData` (#9156) # Which issue does this PR close? - part of #9061 - follow on #9114 # Rationale for this change @scovich noted in #9114 (comment) that calling `Vec::remove` does an extra copy and that `Vec::from` doesn't actually reuse the allocation the way I thought it did # What changes are included in this PR? Build the Arc for buffers directly # Are these changes tested? BY existing tests # Are there any user-facing changes?

# Which issue does this PR close? - Part of #9061 - broken out of #9058 # Rationale for this change Let's make arrow-rs the fastest we can and the fewer allocations the better # What changes are included in this PR? Apply pattern from #9114 # Are these changes tested? Existing tests # Are there any user-facing changes? No

…9160) # Which issue does this PR close? - Part of #9061 - broken out of #9058 # Rationale for this change Let's make arrow-rs the fastest we can and the fewer allocations the better # What changes are included in this PR? Apply pattern from #9114 # Are these changes tested? Existing tests # Are there any user-facing changes? No

…ray` (apache#9114) # Which issue does this PR close? - Part of apache#9061 - broken out of apache#9058 # Rationale for this change The current implementation of `make_array` for StructArray and GenericByteViewArray clones `ArrayData` which allocates a new Vec. This is unnecessary given that `make_array` is passed an owned ArrayData # What changes are included in this PR? 1. Add a new API to ArrayData to break it down into parts (`into_parts`) 2. Use that API to avoid cloning while constructing StructArray and GenericByteViewArray # Are these changes tested? Yes by CI # Are there any user-facing changes? A few fewer allocations when creating arrays --------- Co-authored-by: Jeffrey Vo <[email protected]>

@scovich

…om `ArrayData` (apache#9156) # Which issue does this PR close? - part of apache#9061 - follow on apache#9114 # Rationale for this change @scovich noted in apache#9114 (comment) that calling `Vec::remove` does an extra copy and that `Vec::from` doesn't actually reuse the allocation the way I thought it did # What changes are included in this PR? Build the Arc for buffers directly # Are these changes tested? BY existing tests # Are there any user-facing changes?

# Which issue does this PR close? - Part of apache#9061 - broken out of apache#9058 # Rationale for this change Let's make arrow-rs the fastest we can and the fewer allocations the better # What changes are included in this PR? Apply pattern from apache#9114 # Are these changes tested? Existing tests # Are there any user-facing changes? No

…pache#9160) # Which issue does this PR close? - Part of apache#9061 - broken out of apache#9058 # Rationale for this change Let's make arrow-rs the fastest we can and the fewer allocations the better # What changes are included in this PR? Apply pattern from apache#9114 # Are these changes tested? Existing tests # Are there any user-facing changes? No

# Which issue does this PR close? - Part of #9061 - broken out of #9058 # Rationale for this change Let's make arrow-rs the fastest we can and the fewer allocations the better # What changes are included in this PR? Apply pattern from #9114 # Are these changes tested? Existing tests # Are there any user-facing changes? No

Avoid clones in make_array for StructArray and `GenericByteViewAr…

372b59c

…ray`

alamb added the performance label Jan 7, 2026

github-actions bot added the arrow Changes to the arrow crate label Jan 7, 2026

alamb commented Jan 7, 2026

View reviewed changes

alamb mentioned this pull request Jan 7, 2026

Avoid clones while creating Arrays from ArrayData (speed up reading from Parquet reader) #9058

Draft

1 task

alamb marked this pull request as ready for review January 7, 2026 23:27

mhilton approved these changes Jan 9, 2026

View reviewed changes

Merge remote-tracking branch 'apache/main' into alamb/struct_array_cl…

4895eda

…ones

Jefffrey approved these changes Jan 10, 2026

View reviewed changes

arrow-array/src/array/byte_view_array.rs Outdated Show resolved Hide resolved

scovich approved these changes Jan 10, 2026

View reviewed changes

arrow-array/src/array/byte_view_array.rs Outdated Show resolved Hide resolved

Apply suggestions from code review

c5b727f

Co-authored-by: Jeffrey Vo <[email protected]>

Merge remote-tracking branch 'apache/main' into alamb/struct_array_cl…

59d9035

…ones

alamb merged commit 237065b into apache:main Jan 12, 2026
26 checks passed

This was referenced Jan 13, 2026

Minor: try and avoid an allocation creating GenericByteViewArray from ArrayData #9156

Merged

Avoid a clone when creating BooleanArray from ArrayData #9159

Merged

Avoid a clone when creating StringArray/BinaryArray from ArrayData #9160

Merged

alamb mentioned this pull request Jan 15, 2026

Avoid a clone when creating DictionaryArray from ArrayData #9185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid clones in `make_array` for `StructArray` and `GenericByteViewArray` #9114

Avoid clones in `make_array` for `StructArray` and `GenericByteViewArray` #9114

alamb commented Jan 7, 2026

Uh oh!

alamb Jan 7, 2026

Uh oh!

alamb Jan 7, 2026 •

edited

Loading

Uh oh!

alamb Jan 7, 2026

Uh oh!

alamb Jan 7, 2026

Uh oh!

alamb Jan 11, 2026

Uh oh!

alamb commented Jan 9, 2026

Uh oh!

Uh oh!

scovich left a comment

Uh oh!

Uh oh!

alamb commented Jan 11, 2026

Uh oh!

Uh oh!

scovich commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Avoid clones in make_array for StructArray and GenericByteViewArray #9114

Avoid clones in make_array for StructArray and GenericByteViewArray #9114

Conversation

alamb commented Jan 7, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 9, 2026

Uh oh!

Uh oh!

scovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Jan 11, 2026

Uh oh!

Uh oh!

scovich commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Avoid clones in `make_array` for `StructArray` and `GenericByteViewArray` #9114

Avoid clones in `make_array` for `StructArray` and `GenericByteViewArray` #9114

alamb Jan 7, 2026 •

edited

Loading