-
Notifications
You must be signed in to change notification settings - Fork 467
GH-541: Document status of file_path #542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 7 commits
a9fedb8
6cdba71
9c188c1
5defe90
52bd52c
632300f
4fa3238
4b9c241
69af0d6
c2f1da8
64dd6a4
563c576
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -958,6 +958,23 @@ union ColumnCryptoMetaData { | |
| struct ColumnChunk { | ||
| /** File where column data is stored. If not set, assumed to be same file as | ||
| * metadata. This path is relative to the current file. | ||
| * | ||
| * As of December 2025, the only known use-case for this field is writing summary | ||
| * parquet files (i.e. "_metadata" files). These files consolidate footers from | ||
| * multiple parquet files to allow for efficient reading of footers to avoid file | ||
| * listing costs and prune out files that do not need to be read based on statistics. | ||
| * This is legacy feature as modern table formats (e.g. Iceberg, Hudi and Delta Lake) | ||
| * are more scalable and serve effectively the same purpose. | ||
|
||
| * | ||
| * There is no other known usage of this field. Specifically, there are no known | ||
| * readers that will read externally stored column data if this field is populated | ||
emkornfield marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * within a standard parquet file. Making use of the field for this purpose is currently | ||
| * not considered part of the Parquet specification. | ||
| * | ||
emkornfield marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * | ||
emkornfield marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * Any new use of this field must go through the normal Parquet feature | ||
| * addition process. | ||
| * | ||
| **/ | ||
| 1: optional string file_path | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a link that describes what a summary file is and what implementations support it?
This is what came back from a quick google search: https://stackoverflow.com/questions/53150801/what-is-the-parquet-summary-file
But I didn't see any mention of this in the format repository: https://github.com/search?q=repo%3Aapache%2Fparquet-format%20summary&type=code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this was ever officially part of the parquet specification as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded this section.