You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
create PageIndexPolicy to allow optional indexes (#8071)
# Which issue does this PR close?
- Closes#8070.
# Rationale for this change
This change introduces a more flexible way to handle page indexes
(column and offset indexes) in Parquet files. Previously, the reading of
these indexes was controlled by boolean flags, which indicated read
required or do not read. The new `PageIndexPolicy` enum (`Off`,
`Optional`, `Required`) provides finer control, allowing users to
specify whether an index is not read, read if present (without error if
missing), or strictly required (error if missing).
# What changes are included in this PR?
- Introduced a new `PageIndexPolicy` enum with `Off`, `Optional`, and
`Required` variants.
- Replaced the boolean `column_index` and `offset_index` fields in
`ParquetMetaDataReader` with the new `PageIndexPolicy` enum.
- Updated the `ParquetMetaDataReader::new()` function to initialize page
index policies to `Off`, preserving previous defaults.
- Modified existing `with_page_indexes`, `with_column_indexes`, and
`with_offset_indexes` methods to utilize the new `PageIndexPolicy`,
defaulting to `Required` when enabling indexes.
- Added new methods: `with_page_index_policy`,
`with_column_index_policy`, and `with_offset_index_policy` to allow
direct setting of the page index policy.
- Adjusted the internal logic for parsing column and offset indexes to
respect the specified `PageIndexPolicy`, including returning an error if
a `Required` index is not found.
# Are these changes tested?
Yes, a new test file `parquet/tests/page_index.rs` has been added to
cover the functionality of the new `PageIndexPolicy` and its integration
with `ParquetMetaDataReader`.
# Are there any user-facing changes?
Yes, there are user-facing changes to the `ParquetMetaDataReader` API.
The `with_column_indexes` and `with_offset_indexes` methods now
implicitly use `PageIndexPolicy::Required` when enabling page indexes.
New methods `with_page_index_policy`, `with_column_index_policy`, and
`with_offset_index_policy` have been added.
0 commit comments