GH-43660: [C++][Compute] Avoid ZeroCopyCastExec when casting Binary offset -> Binary offset types#48171
Conversation
| if (output->buffers[0]) { | ||
| // If reusing the null bitmap, ensure offset into the first byte is the same as input. | ||
| output->offset = input_arr->offset % 8; | ||
| output->buffers[0] = SliceBuffer(output->buffers[0], input_arr->offset / 8); | ||
| } else { | ||
| output->offset = 0; | ||
| } |
There was a problem hiding this comment.
I believe you should also slice the offsets buffer when you change the array offset.
| if (output->buffers[0]) { | |
| // If reusing the null bitmap, ensure offset into the first byte is the same as input. | |
| output->offset = input_arr->offset % 8; | |
| output->buffers[0] = SliceBuffer(output->buffers[0], input_arr->offset / 8); | |
| } else { | |
| output->offset = 0; | |
| } | |
| // slice buffers to reduce allocation when casting the offsets buffer | |
| auto length = input_arr->length; | |
| int64_t buffer_offset = 0; | |
| if (output->null_count != 0 && output->buffers[0]) { | |
| // avoiding reallocation of the validity buffer by allowing some padding bits | |
| output->offset = input_arr->offset % 8; | |
| buffer_offset = input_arr->offset / 8; | |
| } else { | |
| output->offset = 0; | |
| buffer_offset = input_arr->offset; | |
| } | |
| output->buffers[0] = SliceBuffer(output->buffers[0], buffer_offset, length); | |
| output->buffers[1] = SliceBuffer(output->buffers[1], buffer_offset, length); |
There was a problem hiding this comment.
Please review my suggestions and tell if something is weird. This stuff is hard.
There was a problem hiding this comment.
Thanks for the suggestions! I didn't slice the offset buffers originally because they are reallocated by CastBinaryToBinaryOffsets with the correct offset. Although, this would only be true for the case where the offset type changes, in the case where the offset stays the same then we would either have to slice here or call ZeroCopyCastExec like in the comment above?
There was a problem hiding this comment.
But if you don't slice they get reallocated with the same length, no? Nevertheless, it becomes a creates a very weird relationship between this code and the function. Making it very non-obvious how it is correct.
There was a problem hiding this comment.
Sorry for the late reply. What you are saying makes sense, however, I think I am still a little confused about specifics of the slicing here. Wouldn't we want to slice the offset buffer with a different value than the validity buffer?
For example, if we are casting a slice that has length 8 and offset 8, we would slice the
validity buffer with a value of 1, and be left with a buffer of length 1 representing the 8 null bits for the elements in our casted slice. We would also slice the offsets buffer by 1, which would mean the buffer will have length 17*offset_size - 1 and be out of alignment.
Similarly in the case where we have output->null_count == 0 (and buffer[0] != nullptr), we would slice the offset buffer by 8, leaving a buffer of size 17*offset_size - 8 and we would also slice the validity buffer by 8 , which would go out of bounds.
Wouldn't we want to slice the offsets buffer by input_arr->offset* sizeof(typename I::offset_type) and the validity buffer by input_arr->offset / 8
Edit: I think the "offset" in SliceBuffer is the byte-offset as opposed to the logical offset.
| if (output->buffers[0]) { | ||
| output->buffers[0] = SliceBuffer(output->buffers[0], offset / 8); | ||
| } | ||
| output->buffers[1] = SliceBuffer(output->buffers[1], offset * input_offset_type_size); |
There was a problem hiding this comment.
I believe in order for this slicing logic to be fully correct it needs to take into account output->offset when doing the slicing as well as the minimum size requirements. So it would look something like
int64_t offset_buffer_offset = (input_offset - output_offset) * offset_type_size
int64_t offset_buffer_length = (length + output_offset + 1) * offset_type_size
if (offset_buffer_length < minimum_size) {
// update the output->offset so that it's now == minimum size
}?
There was a problem hiding this comment.
The current code seems correct to me. I'm not sure what's the concern here.
There was a problem hiding this comment.
When I was testing the buffer slicing logic here (for example, casting a slice with length 3 and offset 1) I was seeing an error that looked like:
'datum.make_array()->ValidateFull()' failed with Invalid: Offsets buffer size (bytes): 16 isn't large enough for length: 3 and offset: 1
Since I think technically the offsets buffer needs to be big enough to hold offset + length + 1 elements whereas the way this is written it will only hold length + 1 many elements. However, because of the code structure here, this path isn't actually reachable since we always reallocate the offsets buffer with the correct size in CastBinaryToBinaryOffsets anyways.
zanmato1984
left a comment
There was a problem hiding this comment.
This is a nice improvement. Some nits.
| if (output->buffers[0]) { | ||
| output->buffers[0] = SliceBuffer(output->buffers[0], offset / 8); | ||
| } | ||
| output->buffers[1] = SliceBuffer(output->buffers[1], offset * input_offset_type_size); |
There was a problem hiding this comment.
The current code seems correct to me. I'm not sure what's the concern here.
zanmato1984
left a comment
There was a problem hiding this comment.
LGTM in general.
Some nits.
zanmato1984
left a comment
There was a problem hiding this comment.
+1. Thanks for this nice improvement!
|
Let's wait for some more while to see if @felipecrv has other comments. Then I'll merge. Thanks @scott-routledge2 ! |
|
@github-actions crossbow submit -g cpp -g python |
|
Revision: 4929843 Submitted crossbow builds: ursacomputing/crossbow @ actions-6816eda394 |
|
The failures seem unrelated. I'm merging now. |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 86166d5. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Casting Binary offset -> Binary offset types relies on ZeroCopyCastExec, which propagates the offset of the input to the output. This can lead to larger allocations than necessary when casting arrays with offsets.
See #43660 and
#43661 for more context.
What changes are included in this PR?
Ensure output array has a small offset (it can still be non-zero since reusing the null bitmap requires in_offset % 8 == out_offset % 8)
Are these changes tested?
Ran unit tests and benchmarked locally.
Are there any user-facing changes?
No