Problem
ManifestGroup::FilterFiles() accepts a file-level expression and stores it in file_filter_, but ReadEntries() did not build or run an evaluator against each DataFile. As a result, non-true file filters were accepted but silently ignored.
This is primarily a public API correctness issue. Normal table scans still apply data and partition filtering through the existing scan path, but direct use of ManifestGroup::FilterFiles() can return entries that should have been filtered out.
Expected behavior
ManifestGroup::FilterFiles() should evaluate supported predicates against DataFile metadata before returning entries. To stay aligned with Java ManifestGroup, file filters should bind against file metadata with an empty partition struct: callers should use FilterData(...) for logical data predicates, while concrete partition.* file filters should fail explicitly instead of being silently ignored.
Reproduction idea
Create a manifest with two entries that differ by record_count, call FilterFiles(record_count >= 10), and observe that entries below the threshold are still returned.
Problem
ManifestGroup::FilterFiles()accepts a file-level expression and stores it infile_filter_, butReadEntries()did not build or run an evaluator against eachDataFile. As a result, non-true file filters were accepted but silently ignored.This is primarily a public API correctness issue. Normal table scans still apply data and partition filtering through the existing scan path, but direct use of
ManifestGroup::FilterFiles()can return entries that should have been filtered out.Expected behavior
ManifestGroup::FilterFiles()should evaluate supported predicates againstDataFilemetadata before returning entries. To stay aligned with JavaManifestGroup, file filters should bind against file metadata with an emptypartitionstruct: callers should useFilterData(...)for logical data predicates, while concretepartition.*file filters should fail explicitly instead of being silently ignored.Reproduction idea
Create a manifest with two entries that differ by
record_count, callFilterFiles(record_count >= 10), and observe that entries below the threshold are still returned.