Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 7, 2026

Which issue does this PR close?

Rationale for this change

The current implementation of make_array for StructArray and GenericByteViewArray clones ArrayData which allocates a new Vec. This is unnecessary given that make_array is passed an owned ArrayData

What changes are included in this PR?

  1. Add a new API to ArrayData to break it down into parts (into_parts)
  2. Use that API to avoid cloning while constructing StructArray and GenericByteViewArray

Are these changes tested?

Yes by CI

Are there any user-facing changes?

A few fewer allocations when creating arrays

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 7, 2026
fn from(value: ArrayData) -> Self {
let views = value.buffers()[0].clone();
let views = ScalarBuffer::new(views, value.offset(), value.len());
let buffers = value.buffers()[1..].to_vec();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this call to_vec() allocates a new Vec which is unecessary


impl<T: ByteViewType + ?Sized> From<ArrayData> for GenericByteViewArray<T> {
fn from(value: ArrayData) -> Self {
let views = value.buffers()[0].clone();
Copy link
Contributor Author

@alamb alamb Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloneing the buffers is relatively cheap (they are Arcd internally) so avoiding this just makes the code easier to follow, I don't think it will be any significant performance savings

make_array(cd.slice(parent_offset, parent_len))
} else {
make_array(cd.clone())
make_array(cd)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd.clone() clones the ArrayData (and allocates a new Vec) for each child, recursively, which is unecessary

.into_iter()
.map(|cd| {
if parent_offset != 0 || parent_len != cd.len() {
make_array(cd.slice(parent_offset, parent_len))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably avoid an additional allocation for sliced arrays by making a version of slice() that consumes self -- like sliced() perhaps 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2026

Thanks @mhilton

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit

@alamb
Copy link
Contributor Author

alamb commented Jan 11, 2026

BTW I plan to make the same corresponding change for other array types (see #9058) but I was making multiple PRs to reduce the review load

@alamb alamb merged commit 237065b into apache:main Jan 12, 2026
26 checks passed
@scovich
Copy link
Contributor

scovich commented Jan 12, 2026

@alamb -- post-merge reply to a resolved comment here (in case github didn't make it easy to find):
#9114 (comment)

alamb added a commit that referenced this pull request Jan 14, 2026
…om `ArrayData` (#9156)

# Which issue does this PR close?

- part of #9061
- follow on #9114


# Rationale for this change

@scovich noted in
#9114 (comment) that
calling `Vec::remove` does an extra copy and that `Vec::from` doesn't
actually reuse the allocation the way I thought it did


# What changes are included in this PR?

Build the Arc for buffers directly

# Are these changes tested?

BY existing tests

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
Dandandan pushed a commit that referenced this pull request Jan 14, 2026
# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
Dandandan pushed a commit that referenced this pull request Jan 14, 2026
…9160)

# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
…ray` (apache#9114)

# Which issue does this PR close?


- Part of apache#9061
- broken out of apache#9058

# Rationale for this change

The current implementation of `make_array` for StructArray and
GenericByteViewArray clones `ArrayData` which allocates a new Vec. This
is unnecessary given that `make_array` is passed an owned ArrayData


# What changes are included in this PR?

1. Add a new API to ArrayData to break it down into parts (`into_parts`)
2. Use that API to avoid cloning while constructing StructArray and
GenericByteViewArray

# Are these changes tested?

Yes by CI

# Are there any user-facing changes?
A few fewer allocations when creating arrays

---------

Co-authored-by: Jeffrey Vo <[email protected]>
Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
…om `ArrayData` (apache#9156)

# Which issue does this PR close?

- part of apache#9061
- follow on apache#9114


# Rationale for this change

@scovich noted in
apache#9114 (comment) that
calling `Vec::remove` does an extra copy and that `Vec::from` doesn't
actually reuse the allocation the way I thought it did


# What changes are included in this PR?

Build the Arc for buffers directly

# Are these changes tested?

BY existing tests

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
# Which issue does this PR close?

- Part of apache#9061
- broken out of apache#9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from apache#9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
…pache#9160)

# Which issue does this PR close?

- Part of apache#9061
- broken out of apache#9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from apache#9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
Dandandan pushed a commit that referenced this pull request Jan 15, 2026
# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
Dandandan pushed a commit that referenced this pull request Jan 15, 2026
# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
alamb added a commit that referenced this pull request Jan 16, 2026
# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
Dandandan pushed a commit that referenced this pull request Jan 18, 2026
# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
Dandandan pushed a commit that referenced this pull request Jan 18, 2026
# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
alamb added a commit that referenced this pull request Jan 18, 2026
# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
alamb added a commit that referenced this pull request Jan 18, 2026
# Which issue does this PR close?

- Part of #9061
- broken out of #9058

# Rationale for this change

Let's make arrow-rs the fastest we can and the fewer allocations the
better

# What changes are included in this PR?

Apply pattern from #9114 

# Are these changes tested?

Existing tests 

# Are there any user-facing changes?

No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants