layout: page title: Parquet write configuration nav_order: 17
Gluten configuration includes two types. config in parquet and config in spark. the two configurations below has the same effect. One is for spark session, the other is for the query.
sc.conf.set("spark.gluten.sql.native.parquet.write.blockRows")
df.write.option("parquet.block.rows").save()
| parquet-mr default | Spark default | Velox Default | Gluten Config | |
|---|---|---|---|---|
|
Spark |
||||
spark.sql.parquet.outputTimestampType |
int96 | |||
spark.sql.parquet.writeLegacyFormat |
false | |||
| Velox/Arrow |
||||
write_batch_size |
1024 | spark.gluten.sql.columnar.maxBatchSize | ||
rowgroup_length |
1M | parquet.block.rows spark.gluten.sql.native.parquet.write.blockRows |
||
compression_level |
0 | |||
page_index |
false | |||
decimal_as_integer |
false | |||
statistics_enabled |
false | |||
| parquet-mr |
||||
parquet.summary.metadata.level |
all | |||
parquet.enable.summary-metadata |
true | |||
parquet.block.size |
128m | parquet.block.size spark.gluten.sql.columnar.parquet.write.blockSize |
||
parquet.page.size |
1m | 1M | parquet.page.size | |
parquet.compression |
uncompressed | snappy | uncompressed | parquet.compression spark.sql.parquet.compression.codec |
parquet.write.support.class |
org.apache.parquet .hadoop.api.WriteSupport | |||
parquet.enable.dictionary |
true | true | parquet.enable.dictionary | |
parquet.dictionary.page.size |
1m | 1m | ||
parquet.validation |
false | |||
parquet.writer.version |
PARQUET_1_0 | PARQUET_2_6 | parquet.writer.version | |
parquet.memory.pool.ratio |
0.95 | |||
parquet.memory.min.chunk.size |
1m | |||
parquet.writer.max-padding |
8m | |||
parquet.page.size.row.check.min |
100 | |||
parquet.page.size.row.check.max |
10000 | |||
parquet.page.value.count.threshold |
Integer.MAX_VALUE / 2 | |||
parquet.page.size.check.estimate |
true | |||
parquet.columnindex.truncate.length |
64 | |||
parquet.statistics.truncate.length |
2147483647 | |||
parquet.bloom.filter.enabled |
false | |||
parquet.bloom.filter.adaptive.enabled |
false | |||
parquet.bloom.filter.candidates.number |
5 | |||
parquet.bloom.filter.expected.ndv |
||||
parquet.bloom.filter.fpp |
0.01 | |||
parquet.bloom.filter.max.bytes |
1m | |||
parquet.decrypt.off-heap.buffer.enabled |
false | |||
parquet.page.row.count.limit |
20000 | |||
parquet.page.write-checksum.enabled |
true | false | ||
parquet.crypto.factory.class |
None | |||
parquet.compression.codec.zstd.bufferPool.enabled |
true | |||
parquet.compression.codec.zstd.level |
3 | 0 | parquet.compression.codec.zstd.level | |
parquet.compression.codec.zstd.workers |
0 | |||