GH-43660: [C++][Compute] Avoid ZeroCopyCastExec when casting Binary offset -> Binary offset types by scott-routledge2 · Pull Request #48171 · apache/arrow

scott-routledge2 · 2025-11-19T01:40:38Z

Rationale for this change

Casting Binary offset -> Binary offset types relies on ZeroCopyCastExec, which propagates the offset of the input to the output. This can lead to larger allocations than necessary when casting arrays with offsets.

See #43660 and
#43661 for more context.

What changes are included in this PR?

Ensure output array has a small offset (it can still be non-zero since reusing the null bitmap requires in_offset % 8 == out_offset % 8)

Are these changes tested?

Ran unit tests and benchmarked locally.

Are there any user-facing changes?

No

GitHub Issue: Casting from String to LargeString during Parquet read with small batch size can have very high overheads #43660

felipecrv

Great improvement! But you should add tests of casting of [Large]String containing non-zero and >8 offsets. I don't think your current implementation would work without also slicing the offsets buffer.

cpp/src/arrow/compute/kernels/scalar_cast_string.cc

felipecrv · 2025-11-19T19:04:19Z

cpp/src/arrow/compute/kernels/scalar_cast_string.cc

+  if (output->buffers[0]) {
+    // If reusing the null bitmap, ensure offset into the first byte is the same as input.
+    output->offset = input_arr->offset % 8;
+    output->buffers[0] = SliceBuffer(output->buffers[0], input_arr->offset / 8);
+  } else {
+    output->offset = 0;
+  }


I believe you should also slice the offsets buffer when you change the array offset.

Suggested change

if (output->buffers[0]) {

// If reusing the null bitmap, ensure offset into the first byte is the same as input.

output->offset = input_arr->offset % 8;

output->buffers[0] = SliceBuffer(output->buffers[0], input_arr->offset / 8);

} else {

output->offset = 0;

}

// slice buffers to reduce allocation when casting the offsets buffer

auto length = input_arr->length;

int64_t buffer_offset = 0;

if (output->null_count != 0 && output->buffers[0]) {

// avoiding reallocation of the validity buffer by allowing some padding bits

output->offset = input_arr->offset % 8;

buffer_offset = input_arr->offset / 8;

} else {

output->offset = 0;

buffer_offset = input_arr->offset;

}

output->buffers[0] = SliceBuffer(output->buffers[0], buffer_offset, length);

output->buffers[1] = SliceBuffer(output->buffers[1], buffer_offset, length);

Please review my suggestions and tell if something is weird. This stuff is hard.

Thanks for the suggestions! I didn't slice the offset buffers originally because they are reallocated by CastBinaryToBinaryOffsets with the correct offset. Although, this would only be true for the case where the offset type changes, in the case where the offset stays the same then we would either have to slice here or call ZeroCopyCastExec like in the comment above?

But if you don't slice they get reallocated with the same length, no? Nevertheless, it becomes a creates a very weird relationship between this code and the function. Making it very non-obvious how it is correct.

Sorry for the late reply. What you are saying makes sense, however, I think I am still a little confused about specifics of the slicing here. Wouldn't we want to slice the offset buffer with a different value than the validity buffer?

For example, if we are casting a slice that has length 8 and offset 8, we would slice the
validity buffer with a value of 1, and be left with a buffer of length 1 representing the 8 null bits for the elements in our casted slice. We would also slice the offsets buffer by 1, which would mean the buffer will have length 17*offset_size - 1 and be out of alignment.

Similarly in the case where we have output->null_count == 0 (and buffer[0] != nullptr), we would slice the offset buffer by 8, leaving a buffer of size 17*offset_size - 8 and we would also slice the validity buffer by 8 , which would go out of bounds.

Wouldn't we want to slice the offsets buffer by input_arr->offset* sizeof(typename I::offset_type) and the validity buffer by input_arr->offset / 8

Edit: I think the "offset" in SliceBuffer is the byte-offset as opposed to the logical offset.

scott-routledge2 · 2025-12-03T20:38:09Z

cpp/src/arrow/compute/kernels/scalar_cast_string.cc

+    if (output->buffers[0]) {
+      output->buffers[0] = SliceBuffer(output->buffers[0], offset / 8);
+    }
+    output->buffers[1] = SliceBuffer(output->buffers[1], offset * input_offset_type_size);


I believe in order for this slicing logic to be fully correct it needs to take into account output->offset when doing the slicing as well as the minimum size requirements. So it would look something like

int64_t offset_buffer_offset = (input_offset - output_offset) * offset_type_size int64_t offset_buffer_length = (length + output_offset + 1) * offset_type_size if (offset_buffer_length < minimum_size) { // update the output->offset so that it's now == minimum size }

?

The current code seems correct to me. I'm not sure what's the concern here.

When I was testing the buffer slicing logic here (for example, casting a slice with length 3 and offset 1) I was seeing an error that looked like:

'datum.make_array()->ValidateFull()' failed with Invalid: Offsets buffer size (bytes): 16 isn't large enough for length: 3 and offset: 1

Since I think technically the offsets buffer needs to be big enough to hold offset + length + 1 elements whereas the way this is written it will only hold length + 1 many elements. However, because of the code structure here, this path isn't actually reachable since we always reallocate the offsets buffer with the correct size in CastBinaryToBinaryOffsets anyways.

zanmato1984

This is a nice improvement. Some nits.

cpp/src/arrow/compute/kernels/scalar_cast_string.cc

zanmato1984 · 2025-12-08T09:01:16Z

cpp/src/arrow/compute/kernels/scalar_cast_string.cc

+    if (output->buffers[0]) {
+      output->buffers[0] = SliceBuffer(output->buffers[0], offset / 8);
+    }
+    output->buffers[1] = SliceBuffer(output->buffers[1], offset * input_offset_type_size);


The current code seems correct to me. I'm not sure what's the concern here.

zanmato1984

LGTM in general.

Some nits.

cpp/src/arrow/compute/kernels/scalar_cast_string.cc

cpp/src/arrow/compute/kernels/scalar_cast_test.cc

zanmato1984

+1. Thanks for this nice improvement!

zanmato1984 · 2025-12-12T03:00:42Z

Let's wait for some more while to see if @felipecrv has other comments. Then I'll merge. Thanks @scott-routledge2 !

zanmato1984 · 2025-12-17T01:14:27Z

@github-actions crossbow submit -g cpp -g python

github-actions · 2025-12-17T01:17:59Z

Revision: 4929843

Submitted crossbow builds: ursacomputing/crossbow @ actions-6816eda394

Task	Status
example-cpp-minimal-build-static
example-cpp-minimal-build-static-system-dependency
example-cpp-tutorial
example-python-minimal-build-fedora-conda
example-python-minimal-build-ubuntu-venv
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-conda-python-3.10
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2
test-conda-python-3.11
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-hypothesis
test-conda-python-3.11-pandas-latest-numpy-latest
test-conda-python-3.11-spark-master
test-conda-python-3.12
test-conda-python-3.12-cpython-debug
test-conda-python-3.12-pandas-latest-numpy-1.26
test-conda-python-3.12-pandas-latest-numpy-latest
test-conda-python-3.13
test-conda-python-3.13-pandas-nightly-numpy-nightly
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly
test-conda-python-3.14
test-conda-python-emscripten
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1
test-cuda-cpp-ubuntu-24.04-cuda-13.0.2
test-cuda-python-ubuntu-22.04-cuda-11.7.1
test-cuda-python-ubuntu-24.04-cuda-13.0.2
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-debian-12-python-3-amd64
test-debian-12-python-3-i386
test-fedora-42-cpp
test-fedora-42-python-3
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-bundled
test-ubuntu-22.04-cpp-emscripten
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-22.04-python-3
test-ubuntu-22.04-python-313-freethreading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-bundled-offline
test-ubuntu-24.04-cpp-gcc-13-bundled
test-ubuntu-24.04-cpp-gcc-14
test-ubuntu-24.04-cpp-minimal-with-formats
test-ubuntu-24.04-cpp-thread-sanitizer
test-ubuntu-24.04-python-3

zanmato1984 · 2025-12-17T07:56:33Z

The failures seem unrelated. I'm merging now.

conbench-apache-arrow · 2025-12-23T22:42:30Z

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 86166d5.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them.

scott-routledge2 added 3 commits November 18, 2025 20:11

Avoid inheriting offset from input array during string cast

e5b0d43

remove iostream

dfec3b6

remove include

6966d4f

github-actions bot added Component: C++ awaiting review Awaiting review labels Nov 19, 2025

cleanup and add comment

71ed279

felipecrv self-requested a review November 19, 2025 18:17

felipecrv requested changes Nov 19, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 19, 2025

scott-routledge2 added 2 commits December 3, 2025 09:12

Merge branch 'main' into scott/cast_string

49dfcb1

add test and address review comments

bbfd1e6

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 3, 2025

remove unused include

bb8c691

scott-routledge2 commented Dec 3, 2025

View reviewed changes

scott-routledge2 added 2 commits December 3, 2025 15:43

remove debugging prints

c8daf74

add back comment

0b4c46b

zanmato1984 reviewed Dec 8, 2025

View reviewed changes

scott-routledge2 added 4 commits December 9, 2025 18:19

Merge branch 'main' into scott/cast_string

8006925

reuse some logic in ZeroCopyCastExec

4f613e4

fix offset buffer size

e20f54b

add back comment

6fa4898

scott-routledge2 requested a review from felipecrv December 10, 2025 00:14

scott-routledge2 marked this pull request as ready for review December 10, 2025 00:14

zanmato1984 reviewed Dec 10, 2025

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_cast_string.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_cast_string.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_cast_test.cc Show resolved Hide resolved

scott-routledge2 added 2 commits December 11, 2025 16:49

add slice with length test and make comment more verbose

48b0571

Merge branch 'main' into scott/cast_string

4929843

zanmato1984 approved these changes Dec 12, 2025

View reviewed changes

zanmato1984 merged commit 86166d5 into apache:main Dec 17, 2025
51 of 53 checks passed

zanmato1984 removed the awaiting change review Awaiting change review label Dec 17, 2025

This was referenced Dec 17, 2025

Casting from String to LargeString during Parquet read with small batch size can have very high overheads #43660

Closed

GH-43660: [C++] Add a CastingGenerator to Parquet Reader that applies required casts before slicing #43661

Closed

Conversation

scott-routledge2 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

felipecrv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

felipecrv Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felipecrv Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

felipecrv Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zanmato1984 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

scott-routledge2 Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

zanmato1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zanmato1984 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

zanmato1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zanmato1984 left a comment

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Dec 12, 2025

Uh oh!

zanmato1984 commented Dec 17, 2025

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

zanmato1984 commented Dec 17, 2025

Uh oh!

Uh oh!

conbench-apache-arrow bot commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scott-routledge2 commented Nov 19, 2025 •

edited

Loading

felipecrv left a comment •

edited

Loading

felipecrv Nov 19, 2025 •

edited

Loading

scott-routledge2 Dec 3, 2025 •

edited

Loading

scott-routledge2 Dec 3, 2025 •

edited

Loading