Skip to content

Commit 34045db

Browse files
authored
GH-48560: [C++][Parquet] When fuzzing, treat Table validation error as hard error (#48863)
### Rationale for this change Currently, when fuzzing a Parquet file, proper errors are ignored since the file is most of the time simply invalid: it's ok to get an error. However, if reading a Parquet row group was successful, the Parquet reader should have ensured that the resulting Table is structurally sound: calling `Table::Validate` should succeed, and if it fails then it should report a bug. ### What changes are included in this PR? Call `Table::Validate` on a successful read from the Parquet fuzz target, and abort if validation fails. `Table::ValidateFull`, however, is allowed to fail, as for example a Parquet UTF8 column could contain invalid UTF8 bytes, and we don't detect that when decoding. ### Are these changes tested? By existing tests and fuzz regression files. Actual running this on OSS-Fuzz will tell us if this is really a good idea. ### Are there any user-facing changes? No. * GitHub Issue: #48560 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
1 parent 2ac912e commit 34045db

File tree

1 file changed

+8
-3
lines changed

1 file changed

+8
-3
lines changed

cpp/src/parquet/arrow/fuzz_internal.cc

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
#include "arrow/table.h"
2727
#include "arrow/util/base64.h"
2828
#include "arrow/util/fuzz_internal.h"
29+
#include "arrow/util/logging.h"
2930
#include "arrow/util/string.h"
3031
#include "parquet/arrow/reader.h"
3132
#include "parquet/bloom_filter.h"
@@ -95,16 +96,20 @@ std::shared_ptr<DecryptionKeyRetriever> MakeKeyRetriever() {
9596
namespace {
9697

9798
Status FuzzReadData(std::unique_ptr<FileReader> reader) {
98-
auto st = Status::OK();
99+
auto final_status = Status::OK();
99100
for (int i = 0; i < reader->num_row_groups(); ++i) {
100101
std::shared_ptr<Table> table;
101102
auto row_group_status = reader->ReadRowGroup(i, &table);
102103
if (row_group_status.ok()) {
104+
// When reading returns successfully, the Arrow data should be structurally
105+
// valid so that it can be read normally. If that is not the case, abort
106+
// so that the error can be published by OSS-Fuzz.
107+
ARROW_CHECK_OK(table->Validate());
103108
row_group_status &= table->ValidateFull();
104109
}
105-
st &= row_group_status;
110+
final_status &= row_group_status;
106111
}
107-
return st;
112+
return final_status;
108113
}
109114

110115
template <typename DType>

0 commit comments

Comments
 (0)