Skip to content

Conversation

@ArnavBalyan
Copy link
Member

  • Write configurations are often needed for debugging and posterity however application logs are lost in a few days.
  • This change adds an optional flag, which when enabled passes the write configurations to the file footer.
  • Default flag is false, and can be enabled by users to pass this additional metadata to their parquet files.

@ArnavBalyan
Copy link
Member Author

cc @shangxinli @wgtmac could you please take a look thanks!

@wgtmac
Copy link
Member

wgtmac commented Aug 24, 2025

IMHO, this is merely a customized logic which can be handled pretty well by specific applications. We don't want to take the maintenance overhead.

@ArnavBalyan
Copy link
Member Author

IMHO, this is merely a customized logic which can be handled pretty well by specific applications. We don't want to take the maintenance overhead.

Thanks for the review! This is behind a feature flag and upto the users to enable it, the logic is minimal and provides high degree of clarity and debuggability for end users/applications that don't have to re-write this logic throughout. Maybe we could keep it default off and let users enable on demand, wdyt? @wgtmac @shangxinli

@wgtmac
Copy link
Member

wgtmac commented Aug 24, 2025

Defaulting to off does not justify it to be a valid feature to the Parquet library. If users want fine-grained control of the subset of configs, do we want to support it? Or if users have built a custom record writer on top of the ParrquetFileWriter (just like what Iceberg did), how do we know it? How does the ParquetRewriter handle different conflicting configs when merging several parquet files? So to me this is a pure application logic which users can handle it well on their side. We don't want to pay for the complexity within the library.

@ArnavBalyan
Copy link
Member Author

Defaulting to off does not justify it to be a valid feature to the Parquet library. If users want fine-grained control of the subset of configs, do we want to support it? Or if users have built a custom record writer on top of the ParrquetFileWriter (just like what Iceberg did), how do we know it? How does the ParquetRewriter handle different conflicting configs when merging several parquet files? So to me this is a pure application logic which users can handle it well on their side. We don't want to pay for the complexity within the library.

Sure sounds good! Will close this PR, I think some of the above should be easy to solve, definitely requires more discussion 👍

@ArnavBalyan
Copy link
Member Author

Closing this PR as suggested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants