diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift index e99c4619..b75b98e5 100644 --- a/src/main/thrift/parquet.thrift +++ b/src/main/thrift/parquet.thrift @@ -958,6 +958,21 @@ union ColumnCryptoMetaData { struct ColumnChunk { /** File where column data is stored. If not set, assumed to be same file as * metadata. This path is relative to the current file. + * + * As of December 2025, the only known use-case for this field is writing summary + * parquet files (i.e. "_metadata" files). These files consolidate footers from + * multiple parquet files to allow for efficient reading of footers to avoid file + * listing costs and prune out files that do not need to be read based on statistics. + * + * These files do not appear to have ever been formally specified in the specification. + * and are potentially problematic from a correctness perspective [1]. + * + * [1] https://lists.apache.org/thread/ootf2kmyg3p01b1bvplpvp4ftd1bt72d + * + * There is no other known usage of this field. Specifically, there are no known + * reference implementations that will read externally stored column data if this field is populated + * within a standard parquet file. Making use of the field for this purpose is + * not considered part of the Parquet specification. **/ 1: optional string file_path