@@ -55,20 +55,20 @@ spark.read.parquet_metadata("/path/to/parquet").show()
5555
5656The Dataframe provides the following per-file information:
5757
58- | column | type | description |
59- | :-----------------| :----:| :------------------------------------------------------------------------------|
60- | filename | string| The Parquet file name |
61- | blocks | int | Number of blocks / RowGroups in the Parquet file |
62- | compressedBytes | long | Number of compressed bytes of all blocks |
63- | uncompressedBytes | long | Number of uncompressed bytes of all blocks |
64- | rows | long | Number of rows in the file |
65- | columns | int | Number of columns in the file |
66- | values | long | Number of values in the file |
67- | nulls | long | Number of null values in the file |
68- | createdBy | string| The createdBy string of the Parquet file, e.g. library used to write the file |
69- | schema | string| The schema |
70- | encryption | string| The encryption (requires org.apache.parquet:parquet-hadoop:1.12.4 and above) |
71- | keyValues | string-to-string map| Key-value data of the file |
58+ | column | type | description |
59+ | :-----------------| :----:| :------------------------------------------------------------------------------- |
60+ | filename | string| The Parquet file name |
61+ | blocks | int | Number of blocks / RowGroups in the Parquet file |
62+ | compressedBytes | long | Number of compressed bytes of all blocks |
63+ | uncompressedBytes | long | Number of uncompressed bytes of all blocks |
64+ | rows | long | Number of rows in the file |
65+ | columns | int | Number of columns in the file |
66+ | values | long | Number of values in the file |
67+ | nulls | long | Number of null values in the file |
68+ | createdBy | string| The createdBy string of the Parquet file, e.g. library used to write the file |
69+ | schema | string| The schema |
70+ | encryption | string| The encryption (requires ` org.apache.parquet:parquet-hadoop:1.12.4 ` and above) |
71+ | keyValues | string-to-string map| Key-value data of the file |
7272
7373## Parquet file schema
7474
@@ -96,20 +96,20 @@ spark.read.parquet_schema("/path/to/parquet").show()
9696
9797The Dataframe provides the following per-file information:
9898
99- | column | type | description |
100- | :-----------------| :------------:| :--------------------------------------------------------------------------------|
101- | filename | string | The Parquet file name |
102- | columnName | string | The column name |
103- | columnPath | string array | The column path |
104- | repetition | string | The repetition |
105- | type | string | The data type |
106- | length | int | The length of the type |
107- | originalType | string | The original type (requires org.apache.parquet:parquet-hadoop:1.11.0 and above) |
108- | isPrimitive | boolean | True if type is primitive |
109- | primitiveType | string | The primitive type |
110- | primitiveOrder | string | The order of the primitive type |
111- | maxDefinitionLevel| int | The max definition level |
112- | maxRepetitionLevel| int | The max repetition level |
99+ | column | type | description |
100+ | :-----------------| :------------:| :---------------------------------------------------------------------------------- |
101+ | filename | string | The Parquet file name |
102+ | columnName | string | The column name |
103+ | columnPath | string array | The column path |
104+ | repetition | string | The repetition |
105+ | type | string | The data type |
106+ | length | int | The length of the type |
107+ | originalType | string | The original type (requires ` org.apache.parquet:parquet-hadoop:1.11.0 ` and above) |
108+ | isPrimitive | boolean | True if type is primitive |
109+ | primitiveType | string | The primitive type |
110+ | primitiveOrder | string | The order of the primitive type |
111+ | maxDefinitionLevel| int | The max definition level |
112+ | maxRepetitionLevel| int | The max repetition level |
113113
114114## Parquet block / RowGroup metadata
115115
@@ -170,21 +170,22 @@ spark.read.parquet_block_columns("/path/to/parquet").show()
170170+-------------+-----+------+------+-------------------+-------------------+--------------------+------------------+-----------+---------------+-----------------+------+-----+
171171```
172172
173- | column | type | description |
174- | :-----------------| :-----------:| :-----------------------------------------------------|
175- | filename | string | The Parquet file name |
176- | block | int | Block / RowGroup number starting at 1 |
177- | column | array<string >| Block / RowGroup column name |
178- | codec | string | The coded used to compress the block column values |
179- | type | string | The data type of the block column |
180- | encodings | array<string >| Encodings of the block column |
181- | minValue | string | Minimum value of this column in this block |
182- | maxValue | string | Maximum value of this column in this block |
183- | columnStart | long | Start position of the block column in the Parquet file|
184- | compressedBytes | long | Number of compressed bytes of this block column |
185- | uncompressedBytes | long | Number of uncompressed bytes of this block column |
186- | values | long | Number of values in this block column |
187- | nulls | long | Number of null values in this block column |
173+ | column | type | description |
174+ | :------------------| :-------------:| :--------------------------------------------------------------------------------------------------|
175+ | filename | string | The Parquet file name |
176+ | block | int | Block / RowGroup number starting at 1 |
177+ | column | array<string > | Block / RowGroup column name |
178+ | codec | string | The coded used to compress the block column values |
179+ | type | string | The data type of the block column |
180+ | encodings | array<string > | Encodings of the block column |
181+ | isEncrypted | boolean | Whether block column is encrypted (requires ` org.apache.parquet:parquet-hadoop:1.12.3 ` and above) |
182+ | minValue | string | Minimum value of this column in this block |
183+ | maxValue | string | Maximum value of this column in this block |
184+ | columnStart | long | Start position of the block column in the Parquet file |
185+ | compressedBytes | long | Number of compressed bytes of this block column |
186+ | uncompressedBytes | long | Number of uncompressed bytes of this block column |
187+ | values | long | Number of values in this block column |
188+ | nulls | long | Number of null values in this block column |
188189
189190## Parquet partition metadata
190191
@@ -255,6 +256,13 @@ spark.read.parquet_block_columns("/path/to/parquet", parallelism=100)
255256spark.read.parquet_partitions(" /path/to/parquet" , parallelism = 100 )
256257```
257258
259+ ## Encryption
260+
261+ Reading [ encrypted Parquet is supported] ( https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#columnar-encryption ) .
262+ Files encrypted with [ plaintext footer] ( https://github.com/apache/parquet-format/blob/master/Encryption.md#55-plaintext-footer-mode )
263+ can be read without any encryption keys, while encrypted Parquet metadata are then show as ` NULL ` values in the result Dataframe.
264+ Encrypted Parquet files with encrypted footer requires the footer encryption key only. No column encryption keys are needed.
265+
258266## Known Issues
259267
260268Note that this feature is not supported in Python when connected with a [ Spark Connect server] ( README.md#spark-connect-server ) .
0 commit comments