Skip to content

Conversation

@nullccxsy
Copy link
Contributor

  • Added select and project methods to the Schema class for creating projection schemas based on specified field names or IDs.
  • Introduced PruneColumnVisitor to handle the logic for selecting and projecting fields, including support for nested structures.

@nullccxsy nullccxsy requested a review from wgtmac September 6, 2025 11:46
Comment on lines 261 to 272
/// \brief Visitor class for pruning schema columns based on selected field IDs.
///
/// This visitor traverses a schema and creates a projected version containing only
/// the specified fields. It handles different projection modes:
/// - select_full_types=true: Include entire fields when their ID is selected
/// - select_full_types=false: Recursively project nested fields within selected structs
///
/// \warning Error conditions that will cause projection to fail:
/// - Attempting to explicitly project List or Map types (returns InvalidArgument)
/// - Projecting a List when element result is null (returns InvalidArgument)
/// - Projecting a Map without a defined map value type (returns InvalidArgument)
/// - Projecting a struct when result is not StructType (returns InvalidArgument)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is easy and valid to support projections in the nested map and list types and don't know why the Java impl does not support this. The code will be much simpler (shorter) if we support them.

@Fokko Do you have any context on the Java impl?

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nullccxsy!

nullccxsy added 10 commits September 23, 2025 10:30
- Added select and project methods to the Schema class for creating projection schemas based on specified field names or IDs.
- Introduced PruneColumnVisitor to handle the logic for selecting and projecting fields, including support for nested structures.
… handling

- Modified the PruneColumnVisitor class to pass results as shared pointers, improving memory management and clarity.
- Updated Visit methods for ListType, MapType, and StructType to accommodate the new result handling approach.
…rror reporting

- Updated the PruneColumnVisitor class to utilize shared pointers for type results, enhancing memory management.
- Refined Visit methods for StructType, ListType, and MapType to improve clarity and error handling, particularly for cases involving invalid projections.
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

@Xuanwo Xuanwo merged commit 257b1ad into apache:main Sep 23, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants