Skip to content

Latest commit

 

History

History
199 lines (192 loc) · 5.52 KB

File metadata and controls

199 lines (192 loc) · 5.52 KB

layout: page title: Parquet write configuration nav_order: 17

Parquet write configurations in Spark/Velox/Gluten

Gluten configuration includes two types. config in parquet and config in spark. the two configurations below has the same effect. One is for spark session, the other is for the query.

sc.conf.set("spark.gluten.sql.native.parquet.write.blockRows")

df.write.option("parquet.block.rows").save()

parquet-mr default Spark default Velox Default Gluten Config

Spark
spark.sql.parquet.outputTimestampType int96
spark.sql.parquet.writeLegacyFormat false

Velox/Arrow
write_batch_size 1024spark.gluten.sql.columnar.maxBatchSize
rowgroup_length 1Mparquet.block.rows
spark.gluten.sql.native.parquet.write.blockRows
compression_level 0
page_index false
decimal_as_integer false
statistics_enabled false

parquet-mr
parquet.summary.metadata.level all
parquet.enable.summary-metadata true
parquet.block.size 128mparquet.block.size
spark.gluten.sql.columnar.parquet.write.blockSize
parquet.page.size 1m1Mparquet.page.size
parquet.compression uncompressedsnappyuncompressedparquet.compression
spark.sql.parquet.compression.codec
parquet.write.support.class org.apache.parquet
.hadoop.api.WriteSupport
parquet.enable.dictionary truetrueparquet.enable.dictionary
parquet.dictionary.page.size 1m1m
parquet.validation false
parquet.writer.version PARQUET_1_0PARQUET_2_6parquet.writer.version
parquet.memory.pool.ratio 0.95
parquet.memory.min.chunk.size 1m
parquet.writer.max-padding 8m
parquet.page.size.row.check.min 100
parquet.page.size.row.check.max 10000
parquet.page.value.count.threshold Integer.MAX_VALUE / 2
parquet.page.size.check.estimate true
parquet.columnindex.truncate.length 64
parquet.statistics.truncate.length 2147483647
parquet.bloom.filter.enabled false
parquet.bloom.filter.adaptive.enabled false
parquet.bloom.filter.candidates.number 5
parquet.bloom.filter.expected.ndv
parquet.bloom.filter.fpp 0.01
parquet.bloom.filter.max.bytes 1m
parquet.decrypt.off-heap.buffer.enabled false
parquet.page.row.count.limit 20000
parquet.page.write-checksum.enabled truefalse
parquet.crypto.factory.class None
parquet.compression.codec.zstd.bufferPool.enabled true
parquet.compression.codec.zstd.level 30parquet.compression.codec.zstd.level
parquet.compression.codec.zstd.workers 0