Skip to content

Commit 5704577

Browse files
committed
apacheGH-48112: [C++][Parquet] Use more accurate data length estimate when decoding PLAIN BYTE_ARRAY data
1 parent a0aa749 commit 5704577

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

cpp/src/parquet/decoder.cc

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -790,8 +790,13 @@ class PlainByteArrayDecoder : public PlainDecoder<ByteArrayType> {
790790
return Status::OK();
791791
};
792792

793-
return DispatchArrowBinaryHelper<ByteArrayType>(out, num_values, len_,
794-
visit_binary_helper);
793+
// We're going to decode up to `num_values - null_count` PLAIN values,
794+
// and each value has a 4-byte length header that doesn't count for the
795+
// Arrow binary data length.
796+
int64_t estimated_data_length =
797+
std::max<int64_t>(0, len_ - 4 * (num_values - null_count));
798+
return DispatchArrowBinaryHelper<ByteArrayType>(
799+
out, num_values, estimated_data_length, visit_binary_helper);
795800
}
796801

797802
template <typename BuilderType>

0 commit comments

Comments
 (0)