Skip to content

Commit 4ba70c5

Browse files
authored
GH-49229: [C++] Fix abort when reading IPC file with a union validity bitmap and pre-buffering enabled (#49230)
### Rationale for this change The logic for loading a Union array from a IPC file was inquiring whether a validity bitmap is present in a V4 metadata file (i.e. `buffers[0] != nullptr`). However, in the pre-buffering case, the buffers haven't been populated yet at the point, so the check would be ignored and the IPC file reader could happily create a Union array with a top validity bitmap. This could crash later in `UnionArray::SetData`. Found by OSS-Fuzz in https://issues.oss-fuzz.com/issues/482161154 ### Are these changes tested? By integration test and fuzz regression file. There are no unit tests in the C++ test suite that exercise V4 metadata IPC files with top-level union validity bitmaps. ### Are there any user-facing changes? No. **This PR contains a "Critical Fix".** This fixes a controlled crash when reading a pre-V5 IPC file with a top-level union validity bitmap and pre-buffering enabled. Instead a regular error will be returned. There are no known security implications. * GitHub Issue: #49229 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
1 parent 1d76e1e commit 4ba70c5

File tree

2 files changed

+17
-13
lines changed

2 files changed

+17
-13
lines changed

cpp/src/arrow/ipc/reader.cc

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,7 @@ class ArrayLoader {
296296
return Status::OK();
297297
}
298298

299-
Status LoadCommon(Type::type type_id) {
299+
Status LoadCommon(Type::type type_id, bool allow_validity_bitmap = true) {
300300
DCHECK_NE(out_, nullptr);
301301
// This only contains the length and null count, which we need to figure
302302
// out what to do with the buffers. For example, if null_count == 0, then
@@ -314,10 +314,16 @@ class ArrayLoader {
314314
}
315315

316316
if (internal::HasValidityBitmap(type_id, metadata_version_)) {
317-
// Extract null_bitmap which is common to all arrays except for unions
317+
// Extract null bitmap which is common to all arrays except for unions
318318
// and nulls.
319319
if (out_->null_count != 0) {
320-
RETURN_NOT_OK(GetBuffer(buffer_index_, &out_->buffers[0]));
320+
if (allow_validity_bitmap) {
321+
RETURN_NOT_OK(GetBuffer(buffer_index_, &out_->buffers[0]));
322+
} else {
323+
// Caller did not allow this
324+
return Status::Invalid("Cannot read ", ::arrow::internal::ToTypeName(type_id),
325+
" array with top-level validity bitmap");
326+
}
321327
}
322328
buffer_index_++;
323329
}
@@ -471,9 +477,10 @@ class ArrayLoader {
471477
int n_buffers = type.mode() == UnionMode::SPARSE ? 2 : 3;
472478
out_->buffers.resize(n_buffers);
473479

474-
RETURN_NOT_OK(LoadCommon(type.id()));
475-
476-
// With metadata V4, we can get a validity bitmap.
480+
// With metadata V4, we can get a validity bitmap. The bitmap may be there
481+
// if we're loading eagerly, or it might be scheduled for loading if we're
482+
// using a BatchDataReadRequest.
483+
//
477484
// Trying to fix up union data to do without the top-level validity bitmap
478485
// is hairy:
479486
// - type ids must be rewritten to all have valid values (even for former
@@ -482,12 +489,9 @@ class ArrayLoader {
482489
// by ANDing the top-level validity bitmap
483490
// - dense union children must be rewritten (at least one of them)
484491
// to insert the required null slots that were formerly omitted
485-
// So instead we bail out.
486-
if (out_->null_count != 0 && out_->buffers[0] != nullptr) {
487-
return Status::Invalid(
488-
"Cannot read pre-1.0.0 Union array with top-level validity bitmap");
489-
}
490-
out_->buffers[0] = nullptr;
492+
//
493+
// So instead we disallow validity bitmaps.
494+
RETURN_NOT_OK(LoadCommon(type.id(), /*allow_validity_bitmap=*/false));
491495
out_->null_count = 0;
492496

493497
if (out_->length > 0) {

0 commit comments

Comments
 (0)