Skip to content

Latest commit

 

History

History
3545 lines (3258 loc) · 71.8 KB

File metadata and controls

3545 lines (3258 loc) · 71.8 KB

layout: page title: Spark configurations status in Gluten Velox Backend nav_order: 17

The file lists the if Spark configurations are hornored by Gluten velox backend or not. Table is from Spark4.0 configuration page. The status are:

  • ✅ Supported
  • ❌ Not Supported
  • ⚠️ Partial Support
  • 🔄 In Progress
  • 🚫 Not applied or transparent to Gluten
  • <blank>: unknown yet

Application Properties

Property Name Default Since Version Gluten Status
spark.app.name (none) 0.9.0
spark.driver.cores 1 1.3.0
spark.driver.maxResultSize 1g 1.2.0
spark.driver.memory 1g 1.1.1
spark.driver.memoryOverhead driverMemory * spark.driver.memoryOverheadFactor, with minimum of spark.driver.minMemoryOverhead 2.3.0
spark.driver.minMemoryOverhead 384m 4.0.0
spark.driver.memoryOverheadFactor 0.10 3.3.0
spark.driver.resource.{resourceName}.amount 0 3.0.0
spark.driver.resource.{resourceName}.discoveryScript None 3.0.0
spark.driver.resource.{resourceName}.vendor None 3.0.0
spark.resources.discoveryPlugin org.apache.spark.resource.ResourceDiscoveryScriptPlugin 3.0.0
spark.executor.memory 1g 0.7.0
spark.executor.pyspark.memory Not set 2.4.0
spark.executor.memoryOverhead executorMemory * spark.executor.memoryOverheadFactor, with minimum of spark.executor.minMemoryOverhead 2.3.0
spark.executor.minMemoryOverhead 384m 4.0.0
spark.executor.memoryOverheadFactor 0.10 3.3.0
spark.executor.resource.{resourceName}.amount 0 3.0.0
spark.executor.resource.{resourceName}.discoveryScript None 3.0.0
spark.executor.resource.{resourceName}.vendor None 3.0.0
spark.extraListeners (none) 1.3.0
spark.local.dir /tmp 0.5.0
spark.logConf false 0.9.0
spark.master (none) 0.9.0
spark.submit.deployMode client 1.5.0
spark.log.callerContext (none) 2.2.0
spark.log.level (none) 3.5.0
spark.driver.supervise false 1.3.0
spark.driver.timeout 0min 4.0.0
spark.driver.log.localDir (none) 4.0.0
spark.driver.log.dfsDir (none) 3.0.0
spark.driver.log.persistToDfs.enabled false 3.0.0
spark.driver.log.layout %d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n%ex 3.0.0
spark.driver.log.allowErasureCoding false 3.0.0
spark.decommission.enabled false 3.1.0
spark.executor.decommission.killInterval (none) 3.1.0
spark.executor.decommission.forceKillTimeout (none) 3.2.0
spark.executor.decommission.signal PWR 3.2.0
spark.executor.maxNumFailures numExecutors * 2, with minimum of 3 3.5.0
spark.executor.failuresValidityInterval (none) 3.5.0

Runtime Environment

Property NameDefaultSince VersionGluten Status
spark.driver.extraClassPath (none) 1.0.0
spark.driver.defaultJavaOptions (none) 3.0.0
spark.driver.extraJavaOptions (none) 1.0.0
spark.driver.extraLibraryPath (none) 1.0.0
spark.driver.userClassPathFirst false 1.3.0
spark.executor.extraClassPath (none) 1.0.0
spark.executor.defaultJavaOptions (none) 3.0.0
spark.executor.extraJavaOptions (none) 1.0.0
spark.executor.extraLibraryPath (none) 1.0.0
spark.executor.logs.rolling.maxRetainedFiles -1 1.1.0
spark.executor.logs.rolling.enableCompression false 2.0.2
spark.executor.logs.rolling.maxSize 1024 * 1024 1.4.0
spark.executor.logs.rolling.strategy "" (disabled) 1.1.0
spark.executor.logs.rolling.time.interval daily 1.1.0
spark.executor.userClassPathFirst false 1.3.0
spark.executorEnv.[EnvironmentVariableName] (none) 0.9.0
spark.redaction.regex (?i)secret|password|token|access[.]?key 2.1.2
spark.redaction.string.regex (none) 2.2.0
spark.python.profile false 1.2.0
spark.python.profile.dump (none) 1.2.0
spark.python.worker.memory 512m 1.1.0
spark.python.worker.reuse true 1.2.0
spark.files 1.0.0
spark.submit.pyFiles 1.0.1
spark.jars 0.9.0
spark.jars.packages 1.5.0
spark.jars.excludes 1.5.0
spark.jars.ivy 1.3.0
spark.jars.ivySettings 2.2.0
spark.jars.repositories 2.3.0
spark.archives 3.1.0
spark.pyspark.driver.python 2.1.0
spark.pyspark.python 2.1.0

Shuffle Behavior

Property NameDefaultSince VersionGluten Status
spark.reducer.maxSizeInFlight 48m 1.4.0
spark.reducer.maxReqsInFlight Int.MaxValue 2.0.0
spark.reducer.maxBlocksInFlightPerAddress Int.MaxValue 2.2.1
spark.shuffle.compress true 0.6.0
spark.shuffle.file.buffer 32k 1.4.0
spark.shuffle.file.merge.buffer 32k 4.0.0
spark.shuffle.unsafe.file.output.buffer 32k 2.3.0
spark.shuffle.localDisk.file.output.buffer 32k 4.0.0
spark.shuffle.spill.diskWriteBufferSize 1024 * 1024 2.3.0
spark.shuffle.io.maxRetries 3 1.2.0
spark.shuffle.io.numConnectionsPerPeer 1 1.2.1
spark.shuffle.io.preferDirectBufs true 1.2.0
spark.shuffle.io.retryWait 5s 1.2.1
spark.shuffle.io.backLog -1 1.1.1
spark.shuffle.io.connectionTimeout value of spark.network.timeout 1.2.0
spark.shuffle.io.connectionCreationTimeout value of spark.shuffle.io.connectionTimeout 3.2.0
spark.shuffle.service.enabled false 1.2.0
spark.shuffle.service.port 7337 1.2.0
spark.shuffle.service.name spark_shuffle 3.2.0
spark.shuffle.service.index.cache.size 100m 2.3.0
spark.shuffle.service.removeShuffle true 3.3.0
spark.shuffle.maxChunksBeingTransferred Long.MAX_VALUE 2.3.0
spark.shuffle.sort.bypassMergeThreshold 200 1.1.1
spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.sort.io.LocalDiskShuffleDataIO 3.0.0
spark.shuffle.spill.compress true 0.9.0
spark.shuffle.accurateBlockThreshold 100 * 1024 * 1024 2.2.1
spark.shuffle.accurateBlockSkewedFactor -1.0 3.3.0
spark.shuffle.registration.timeout 5000 2.3.0
spark.shuffle.registration.maxAttempts 3 2.3.0
spark.shuffle.reduceLocality.enabled true 1.5.0
spark.shuffle.mapOutput.minSizeForBroadcast 512k 2.0.0
spark.shuffle.detectCorrupt true 2.2.0
spark.shuffle.detectCorrupt.useExtraMemory false 3.0.0
spark.shuffle.useOldFetchProtocol false 3.0.0
spark.shuffle.readHostLocalDisk true 3.0.0
spark.files.io.connectionTimeout value of spark.network.timeout 1.6.0
spark.files.io.connectionCreationTimeout value of spark.files.io.connectionTimeout 3.2.0
spark.shuffle.checksum.enabled true 3.2.0
spark.shuffle.checksum.algorithm ADLER32 3.2.0
spark.shuffle.service.fetch.rdd.enabled false 3.0.0
spark.shuffle.service.db.enabled true 3.0.0
spark.shuffle.service.db.backend ROCKSDB 3.4.0

Spark UI

Property NameDefaultSince VersionGluten Status
spark.eventLog.logBlockUpdates.enabled false 2.3.0
spark.eventLog.longForm.enabled false 2.4.0
spark.eventLog.compress true 1.0.0
spark.eventLog.compression.codec zstd 3.0.0
spark.eventLog.erasureCoding.enabled false 3.0.0
spark.eventLog.dir file:///tmp/spark-events 1.0.0
spark.eventLog.enabled false 1.0.0
spark.eventLog.overwrite false 1.0.0
spark.eventLog.buffer.kb 100k 1.0.0
spark.eventLog.rolling.enabled false 3.0.0
spark.eventLog.rolling.maxFileSize 128m 3.0.0
spark.ui.dagGraph.retainedRootRDDs Int.MaxValue 2.1.0
spark.ui.groupSQLSubExecutionEnabled true 3.4.0
spark.ui.enabled true 1.1.1
spark.ui.store.path None 3.4.0
spark.ui.killEnabled true 1.0.0
spark.ui.threadDumpsEnabled true 1.2.0
spark.ui.threadDump.flamegraphEnabled true 4.0.0
spark.ui.heapHistogramEnabled true 3.5.0
spark.ui.liveUpdate.period 100ms 2.3.0
spark.ui.liveUpdate.minFlushPeriod 1s 2.4.2
spark.ui.port 4040 0.7.0
spark.ui.retainedJobs 1000 1.2.0
spark.ui.retainedStages 1000 0.9.0
spark.ui.retainedTasks 100000 2.0.1
spark.ui.reverseProxy false 2.1.0
spark.ui.reverseProxyUrl 2.1.0
spark.ui.proxyRedirectUri 3.0.0
spark.ui.showConsoleProgress false 1.2.1
spark.ui.consoleProgress.update.interval 200 2.1.0
spark.ui.custom.executor.log.url (none) 3.0.0
spark.ui.prometheus.enabled true 3.0.0
spark.worker.ui.retainedExecutors 1000 1.5.0
spark.worker.ui.retainedDrivers 1000 1.5.0
spark.sql.ui.retainedExecutions 1000 1.5.0
spark.streaming.ui.retainedBatches 1000 1.0.0
spark.ui.retainedDeadExecutors 100 2.0.0
spark.ui.filters None 1.0.0
spark.ui.requestHeaderSize 8k 2.2.3
spark.ui.timelineEnabled true 3.4.0
spark.ui.timeline.executors.maximum 250 3.2.0
spark.ui.timeline.jobs.maximum 500 3.2.0
spark.ui.timeline.stages.maximum 500 3.2.0
spark.ui.timeline.tasks.maximum 1000 1.4.0
spark.appStatusStore.diskStoreDir None 3.4.0

Compression and Serialization

Property NameDefaultSince VersionGluten Status
spark.broadcast.compress true 0.6.0
spark.checkpoint.dir (none) 4.0.0
spark.checkpoint.compress false 2.2.0
spark.io.compression.codec lz4 0.8.0
spark.io.compression.lz4.blockSize 32k 1.4.0
spark.io.compression.snappy.blockSize 32k 1.4.0
spark.io.compression.zstd.level 1 2.3.0
spark.io.compression.zstd.bufferSize 32k 2.3.0
spark.io.compression.zstd.bufferPool.enabled true 3.2.0
spark.io.compression.zstd.workers 0 4.0.0
spark.io.compression.lzf.parallel.enabled false 4.0.0
spark.kryo.classesToRegister (none) 1.2.0
spark.kryo.referenceTracking true 0.8.0
spark.kryo.registrationRequired false 1.1.0
spark.kryo.registrator (none) 0.5.0
spark.kryo.unsafe true 2.1.0
spark.kryoserializer.buffer.max 64m 1.4.0
spark.kryoserializer.buffer 64k 1.4.0
spark.rdd.compress false 0.6.0
spark.serializer org.apache.spark.serializer.
JavaSerializer
0.5.0
spark.serializer.objectStreamReset 100 1.0.0

Memory Management

Property NameDefaultSince VersionGluten Status
spark.memory.fraction 0.6 1.6.0
spark.memory.storageFraction 0.5 1.6.0
spark.memory.offHeap.enabled false 1.6.0
spark.memory.offHeap.size 0 1.6.0
spark.storage.unrollMemoryThreshold 1024 * 1024 1.1.0
spark.storage.replication.proactive true 2.2.0
spark.storage.localDiskByExecutors.cacheSize 1000 3.0.0
spark.cleaner.periodicGC.interval 30min 1.6.0
spark.cleaner.referenceTracking true 1.0.0
spark.cleaner.referenceTracking.blocking true 1.0.0
spark.cleaner.referenceTracking.blocking.shuffle false 1.1.1
spark.cleaner.referenceTracking.cleanCheckpoints false 1.4.0

Execution Behavior

Property NameDefaultSince VersionGluten Status
spark.broadcast.blockSize 4m 0.5.0
spark.broadcast.checksum true 2.1.1
spark.broadcast.UDFCompressionThreshold 1 * 1024 * 1024 3.0.0
spark.executor.cores 1 in YARN mode, all the available cores on the worker in standalone mode. 1.0.0
spark.default.parallelism For distributed shuffle operations like reduceByKey and join, the largest number of partitions in a parent RDD. For operations like parallelize with no parent RDDs, it depends on the cluster manager:
  • Local mode: number of cores on the local machine
  • Others: total number of cores on all executor nodes or 2, whichever is larger
0.5.0
spark.executor.heartbeatInterval 10s 1.1.0
spark.files.fetchTimeout 60s 1.0.0
spark.files.useFetchCache true 1.2.2
spark.files.overwrite false 1.0.0
spark.files.ignoreCorruptFiles false 2.1.0
spark.files.ignoreMissingFiles false 2.4.0
spark.files.maxPartitionBytes 134217728 (128 MiB) 2.1.0
spark.files.openCostInBytes 4194304 (4 MiB) 2.1.0
spark.hadoop.cloneConf false 1.0.3
spark.hadoop.validateOutputSpecs true 1.0.1
spark.storage.memoryMapThreshold 2m 0.9.2
spark.storage.decommission.enabled false 3.1.0
spark.storage.decommission.shuffleBlocks.enabled true 3.1.0
spark.storage.decommission.shuffleBlocks.maxThreads 8 3.1.0
spark.storage.decommission.rddBlocks.enabled true 3.1.0
spark.storage.decommission.fallbackStorage.path (none) 3.1.0
spark.storage.decommission.fallbackStorage.cleanUp false 3.2.0
spark.storage.decommission.shuffleBlocks.maxDiskSize (none) 3.2.0
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 1 2.2.0

Executor Metrics

These configurations are handled by Spark and do not affect Gluten’s behavior.

Networking

These configurations are handled by Spark and do not affect Gluten’s behavior.

Scheduling

Property NameDefaultSince Version
spark.cores.max (not set) 0.6.0
spark.locality.wait 3s 0.5.0
spark.locality.wait.node spark.locality.wait 0.8.0
spark.locality.wait.process spark.locality.wait 0.8.0
spark.locality.wait.rack spark.locality.wait 0.8.0
spark.scheduler.maxRegisteredResourcesWaitingTime 30s 1.1.1
spark.scheduler.minRegisteredResourcesRatio 0.8 for KUBERNETES mode; 0.8 for YARN mode; 0.0 for standalone mode 1.1.1
spark.scheduler.mode FIFO 0.8.0
spark.scheduler.revive.interval 1s 0.8.1
spark.scheduler.listenerbus.eventqueue.capacity 10000 2.3.0
spark.scheduler.listenerbus.eventqueue.shared.capacity spark.scheduler.listenerbus.eventqueue.capacity 3.0.0
spark.scheduler.listenerbus.eventqueue.appStatus.capacity spark.scheduler.listenerbus.eventqueue.capacity 3.0.0
spark.scheduler.listenerbus.eventqueue.executorManagement.capacity spark.scheduler.listenerbus.eventqueue.capacity 3.0.0
spark.scheduler.listenerbus.eventqueue.eventLog.capacity spark.scheduler.listenerbus.eventqueue.capacity 3.0.0
spark.scheduler.listenerbus.eventqueue.streams.capacity spark.scheduler.listenerbus.eventqueue.capacity 3.0.0
spark.scheduler.resource.profileMergeConflicts false 3.1.0
spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout 120s 2.4.1
spark.standalone.submit.waitAppCompletion false 3.1.0
spark.excludeOnFailure.enabled false 2.1.0
spark.excludeOnFailure.application.enabled false 4.0.0
spark.excludeOnFailure.taskAndStage.enabled false 4.0.0
spark.excludeOnFailure.timeout 1h 2.1.0
spark.excludeOnFailure.task.maxTaskAttemptsPerExecutor 1 2.1.0
spark.excludeOnFailure.task.maxTaskAttemptsPerNode 2 2.1.0
spark.excludeOnFailure.stage.maxFailedTasksPerExecutor 2 2.1.0
spark.excludeOnFailure.stage.maxFailedExecutorsPerNode 2 2.1.0
spark.excludeOnFailure.application.maxFailedTasksPerExecutor 2 2.2.0
spark.excludeOnFailure.application.maxFailedExecutorsPerNode 2 2.2.0
spark.excludeOnFailure.killExcludedExecutors false 2.2.0
spark.excludeOnFailure.application.fetchFailure.enabled false 2.3.0
spark.speculation false 0.6.0
spark.speculation.interval 100ms 0.6.0
spark.speculation.multiplier 3 0.6.0
spark.speculation.quantile 0.9 0.6.0
spark.speculation.minTaskRuntime 100ms 3.2.0
spark.speculation.task.duration.threshold None 3.0.0
spark.speculation.efficiency.processRateMultiplier 0.75 3.4.0
spark.speculation.efficiency.longRunTaskFactor 2 3.4.0
spark.speculation.efficiency.enabled true 3.4.0
spark.task.cpus 1 0.5.0
spark.task.resource.{resourceName}.amount 1 3.0.0
spark.task.maxFailures 4 0.8.0
spark.task.reaper.enabled false 2.0.3
spark.task.reaper.pollingInterval 10s 2.0.3
spark.task.reaper.threadDump true 2.0.3
spark.task.reaper.killTimeout -1 2.0.3
spark.stage.maxConsecutiveAttempts 4 2.2.0
spark.stage.ignoreDecommissionFetchFailure true 3.4.0

Barrier Execution Mode

These configurations are handled by Spark and do not affect Gluten’s behavior.

Dynamic Allocation

These configurations are handled by Spark and do not affect Gluten’s behavior.

Thread Configurations

These configurations are handled by Spark and do not affect Gluten’s behavior.

Spark Connect

Server Configuration

These configurations are handled by Spark and do not affect Gluten’s behavior.

Security

These configurations are handled by Spark and do not affect Gluten’s behavior.

Spark SQL

Runtime SQL Configuration

Property NameDefaultSince VersionGluten Status
spark.sql.adaptive.advisoryPartitionSizeInBytes (value of spark.sql.adaptive.shuffle.targetPostShuffleInputSize) 3.0.0
spark.sql.adaptive.autoBroadcastJoinThreshold (none) 3.2.0
spark.sql.adaptive.coalescePartitions.enabled true 3.0.0
spark.sql.adaptive.coalescePartitions.initialPartitionNum (none) 3.0.0
spark.sql.adaptive.coalescePartitions.minPartitionSize 1MB 3.2.0
spark.sql.adaptive.coalescePartitions.parallelismFirst true 3.2.0
spark.sql.adaptive.customCostEvaluatorClass (none) 3.2.0
spark.sql.adaptive.enabled true 1.6.0
spark.sql.adaptive.forceOptimizeSkewedJoin false 3.3.0
spark.sql.adaptive.localShuffleReader.enabled true 3.0.0
spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold 0b 3.2.0
spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled true 3.2.0
spark.sql.adaptive.optimizer.excludedRules (none) 3.1.0
spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor 0.2 3.3.0
spark.sql.adaptive.skewJoin.enabled true 3.0.0
spark.sql.adaptive.skewJoin.skewedPartitionFactor 5.0 3.0.0
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes 256MB 3.0.0
spark.sql.allowNamedFunctionArguments true 3.5.0
spark.sql.ansi.doubleQuotedIdentifiers false 3.4.0
spark.sql.ansi.enabled true 3.0.0
spark.sql.ansi.enforceReservedKeywords false 3.3.0
spark.sql.ansi.relationPrecedence false 3.4.0
spark.sql.autoBroadcastJoinThreshold 10MB 1.1.0
spark.sql.avro.compression.codec snappy 2.4.0
spark.sql.avro.deflate.level -1 2.4.0
spark.sql.avro.filterPushdown.enabled true 3.1.0
spark.sql.avro.xz.level 6 4.0.0
spark.sql.avro.zstandard.bufferPool.enabled false 4.0.0
spark.sql.avro.zstandard.level 3 4.0.0
spark.sql.binaryOutputStyle (none) 4.0.0
spark.sql.broadcastTimeout 300 1.3.0
spark.sql.bucketing.coalesceBucketsInJoin.enabled false 3.1.0
spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio 4 3.1.0
spark.sql.catalog.spark_catalog builtin 3.0.0
spark.sql.cbo.enabled false 2.2.0
spark.sql.cbo.joinReorder.dp.star.filter false 2.2.0
spark.sql.cbo.joinReorder.dp.threshold 12 2.2.0
spark.sql.cbo.joinReorder.enabled false 2.2.0
spark.sql.cbo.planStats.enabled false 3.0.0
spark.sql.cbo.starSchemaDetection false 2.2.0
spark.sql.charAsVarchar false 3.3.0
spark.sql.chunkBase64String.enabled true 3.5.2
spark.sql.cli.print.header false 3.2.0
spark.sql.columnNameOfCorruptRecord _corrupt_record 1.2.0
spark.sql.csv.filterPushdown.enabled true 3.0.0
spark.sql.datetime.java8API.enabled false 3.0.0
spark.sql.debug.maxToStringFields 25 3.0.0
spark.sql.defaultCacheStorageLevel MEMORY_AND_DISK 4.0.0
spark.sql.defaultCatalog spark_catalog 3.0.0
spark.sql.error.messageFormat PRETTY 3.4.0
spark.sql.execution.arrow.enabled false 2.3.0
spark.sql.execution.arrow.fallback.enabled true 2.4.0
spark.sql.execution.arrow.localRelationThreshold 48MB 3.4.0
spark.sql.execution.arrow.maxRecordsPerBatch 10000 2.3.0
spark.sql.execution.arrow.pyspark.enabled (value of spark.sql.execution.arrow.enabled) 3.0.0
spark.sql.execution.arrow.pyspark.fallback.enabled (value of spark.sql.execution.arrow.fallback.enabled) 3.0.0
spark.sql.execution.arrow.pyspark.selfDestruct.enabled false 3.2.0
spark.sql.execution.arrow.sparkr.enabled false 3.0.0
spark.sql.execution.arrow.transformWithStateInPandas.maxRecordsPerBatch 10000 4.0.0
spark.sql.execution.arrow.useLargeVarTypes false 3.5.0
spark.sql.execution.interruptOnCancel true 4.0.0
spark.sql.execution.pandas.inferPandasDictAsMap false 4.0.0
spark.sql.execution.pandas.structHandlingMode legacy 3.5.0
spark.sql.execution.pandas.udf.buffer.size (value of spark.buffer.size) 3.0.0
spark.sql.execution.pyspark.udf.faulthandler.enabled (value of spark.python.worker.faulthandler.enabled) 4.0.0
spark.sql.execution.pyspark.udf.hideTraceback.enabled false 4.0.0
spark.sql.execution.pyspark.udf.idleTimeoutSeconds (value of spark.python.worker.idleTimeoutSeconds) 4.0.0
spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled true 3.1.0
spark.sql.execution.python.udf.buffer.size (value of spark.buffer.size) 4.0.0
spark.sql.execution.python.udf.maxRecordsPerBatch 100 4.0.0
spark.sql.execution.pythonUDF.arrow.concurrency.level (none) 4.0.0
spark.sql.execution.pythonUDF.arrow.enabled false 3.4.0
spark.sql.execution.pythonUDTF.arrow.enabled false 3.5.0
spark.sql.execution.topKSortFallbackThreshold 2147483632 2.4.0
spark.sql.extendedExplainProviders (none) 4.0.0
spark.sql.files.ignoreCorruptFiles false 2.1.1
spark.sql.files.ignoreInvalidPartitionPaths false 4.0.0
spark.sql.files.ignoreMissingFiles false 2.3.0
spark.sql.files.maxPartitionBytes 128MB 2.0.0
spark.sql.files.maxPartitionNum (none) 3.5.0
spark.sql.files.maxRecordsPerFile 0 2.2.0
spark.sql.files.minPartitionNum (none) 3.1.0
spark.sql.function.concatBinaryAsString false 2.3.0
spark.sql.function.eltOutputAsString false 2.3.0
spark.sql.groupByAliases true 2.2.0
spark.sql.groupByOrdinal true 2.0.0
spark.sql.hive.convertInsertingPartitionedTable true 3.0.0
spark.sql.hive.convertInsertingUnpartitionedTable true 4.0.0
spark.sql.hive.convertMetastoreCtas true 3.0.0
spark.sql.hive.convertMetastoreInsertDir true 3.3.0
spark.sql.hive.convertMetastoreOrc true 2.0.0
spark.sql.hive.convertMetastoreParquet true 1.1.1
spark.sql.hive.convertMetastoreParquet.mergeSchema false 1.3.1
spark.sql.hive.dropPartitionByName.enabled false 3.4.0
spark.sql.hive.filesourcePartitionFileCacheSize 262144000 2.1.1
spark.sql.hive.manageFilesourcePartitions true 2.1.1
spark.sql.hive.metastorePartitionPruning true 1.5.0
spark.sql.hive.metastorePartitionPruningFallbackOnException false 3.3.0
spark.sql.hive.metastorePartitionPruningFastFallback false 3.3.0
spark.sql.hive.thriftServer.async true 1.5.0
spark.sql.icu.caseMappings.enabled true 4.0.0
spark.sql.inMemoryColumnarStorage.batchSize 10000 1.1.1
spark.sql.inMemoryColumnarStorage.compressed true 1.0.1
spark.sql.inMemoryColumnarStorage.enableVectorizedReader true 2.3.1
spark.sql.inMemoryColumnarStorage.hugeVectorReserveRatio 1.2 4.0.0
spark.sql.inMemoryColumnarStorage.hugeVectorThreshold -1b 4.0.0
spark.sql.json.filterPushdown.enabled true 3.1.0
spark.sql.json.useUnsafeRow false 4.0.0
spark.sql.jsonGenerator.ignoreNullFields true 3.0.0
spark.sql.leafNodeDefaultParallelism (none) 3.2.0
spark.sql.mapKeyDedupPolicy EXCEPTION 3.0.0
spark.sql.maven.additionalRemoteRepositories https://maven-central.storage-download.googleapis.com/maven2/ 3.0.0
spark.sql.maxMetadataStringLength 100 3.1.0
spark.sql.maxPlanStringLength 2147483632 3.0.0
spark.sql.maxSinglePartitionBytes 128m 3.4.0
spark.sql.operatorPipeSyntaxEnabled true 4.0.0
spark.sql.optimizer.avoidCollapseUDFWithExpensiveExpr true 4.0.0
spark.sql.optimizer.collapseProjectAlwaysInline false 3.3.0
spark.sql.optimizer.dynamicPartitionPruning.enabled true 3.0.0
spark.sql.optimizer.enableCsvExpressionOptimization true 3.2.0
spark.sql.optimizer.enableJsonExpressionOptimization true 3.1.0
spark.sql.optimizer.excludedRules (none) 2.4.0
spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold 10GB 3.3.0 🚫
spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold 10MB 3.3.0 🚫
spark.sql.optimizer.runtime.bloomFilter.enabled true 3.3.0
spark.sql.optimizer.runtime.bloomFilter.expectedNumItems 1000000 3.3.0
spark.sql.optimizer.runtime.bloomFilter.maxNumBits 67108864 3.3.0
spark.sql.optimizer.runtime.bloomFilter.maxNumItems 4000000 3.3.0
spark.sql.optimizer.runtime.bloomFilter.numBits 8388608 3.3.0
spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled true 3.4.0
spark.sql.optimizer.runtimeFilter.number.threshold 10 3.3.0
spark.sql.orc.aggregatePushdown false 3.3.0
spark.sql.orc.columnarReaderBatchSize 4096 2.4.0
spark.sql.orc.columnarWriterBatchSize 1024 3.4.0
spark.sql.orc.compression.codec zstd 2.3.0
spark.sql.orc.enableNestedColumnVectorizedReader true 3.2.0
spark.sql.orc.enableVectorizedReader true 2.3.0
spark.sql.orc.filterPushdown true 1.4.0
spark.sql.orc.mergeSchema false 3.0.0
spark.sql.orderByOrdinal true 2.0.0
spark.sql.parquet.aggregatePushdown false 3.3.0
spark.sql.parquet.binaryAsString false 1.1.1
spark.sql.parquet.columnarReaderBatchSize 4096 2.4.0
spark.sql.parquet.compression.codec snappy 1.1.1
spark.sql.parquet.enableNestedColumnVectorizedReader true 3.3.0
spark.sql.parquet.enableVectorizedReader true 2.0.0
spark.sql.parquet.fieldId.read.enabled false 3.3.0
spark.sql.parquet.fieldId.read.ignoreMissing false 3.3.0
spark.sql.parquet.fieldId.write.enabled true 3.3.0
spark.sql.parquet.filterPushdown true 1.2.0
spark.sql.parquet.inferTimestampNTZ.enabled true 3.4.0
spark.sql.parquet.int96AsTimestamp true 1.3.0
spark.sql.parquet.int96TimestampConversion false 2.3.0
spark.sql.parquet.mergeSchema false 1.5.0
spark.sql.parquet.outputTimestampType INT96 2.3.0
spark.sql.parquet.recordLevelFilter.enabled false 2.3.0
spark.sql.parquet.respectSummaryFiles false 1.5.0
spark.sql.parquet.writeLegacyFormat false 1.6.0
spark.sql.parser.quotedRegexColumnNames false 2.3.0
spark.sql.pivotMaxValues 10000 1.6.0
spark.sql.planner.pythonExecution.memory (none) 4.0.0
spark.sql.preserveCharVarcharTypeInfo false 4.0.0
spark.sql.pyspark.inferNestedDictAsStruct.enabled false 3.3.0
spark.sql.pyspark.jvmStacktrace.enabled false 3.0.0
spark.sql.pyspark.plotting.max_rows 1000 4.0.0
spark.sql.pyspark.udf.profiler (none) 4.0.0
spark.sql.readSideCharPadding true 3.4.0
spark.sql.redaction.options.regex (?i)url 2.2.2
spark.sql.redaction.string.regex (value of spark.redaction.string.regex) 2.3.0
spark.sql.repl.eagerEval.enabled false 2.4.0
spark.sql.repl.eagerEval.maxNumRows 20 2.4.0
spark.sql.repl.eagerEval.truncate 20 2.4.0
spark.sql.scripting.enabled false 4.0.0
spark.sql.session.localRelationCacheThreshold 67108864 3.5.0
spark.sql.session.timeZone (value of local timezone) 2.2.0
spark.sql.shuffle.partitions 200 1.1.0
spark.sql.shuffleDependency.fileCleanup.enabled false 4.0.0
spark.sql.shuffleDependency.skipMigration.enabled false 4.0.0
spark.sql.shuffledHashJoinFactor 3 3.3.0
spark.sql.sources.bucketing.autoBucketedScan.enabled true 3.1.0
spark.sql.sources.bucketing.enabled true 2.0.0
spark.sql.sources.bucketing.maxBuckets 100000 2.4.0
spark.sql.sources.default parquet 1.3.0
spark.sql.sources.parallelPartitionDiscovery.threshold 32 1.5.0
spark.sql.sources.partitionColumnTypeInference.enabled true 1.5.0
spark.sql.sources.partitionOverwriteMode STATIC 2.3.0
spark.sql.sources.v2.bucketing.allowCompatibleTransforms.enabled false 4.0.0
spark.sql.sources.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled false 4.0.0
spark.sql.sources.v2.bucketing.enabled false 3.3.0
spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled false 3.4.0
spark.sql.sources.v2.bucketing.partition.filter.enabled false 4.0.0
spark.sql.sources.v2.bucketing.pushPartValues.enabled true 3.4.0
spark.sql.sources.v2.bucketing.shuffle.enabled false 4.0.0
spark.sql.sources.v2.bucketing.sorting.enabled false 4.0.0
spark.sql.stackTracesInDataFrameContext 1 4.0.0
spark.sql.statistics.fallBackToHdfs false 2.0.0
spark.sql.statistics.histogram.enabled false 2.3.0
spark.sql.statistics.size.autoUpdate.enabled false 2.3.0
spark.sql.statistics.updatePartitionStatsInAnalyzeTable.enabled false 4.0.0
spark.sql.storeAssignmentPolicy ANSI 3.0.0
spark.sql.streaming.checkpointLocation (none) 2.0.0
spark.sql.streaming.continuous.epochBacklogQueueSize 10000 3.0.0
spark.sql.streaming.disabledV2Writers 2.3.1
spark.sql.streaming.fileSource.cleaner.numThreads 1 3.0.0
spark.sql.streaming.forceDeleteTempCheckpointLocation false 3.0.0
spark.sql.streaming.metricsEnabled false 2.0.2
spark.sql.streaming.multipleWatermarkPolicy min 2.4.0
spark.sql.streaming.noDataMicroBatches.enabled true 2.4.1
spark.sql.streaming.numRecentProgressUpdates 100 2.1.1
spark.sql.streaming.sessionWindow.merge.sessions.in.local.partition false 3.2.0
spark.sql.streaming.stateStore.encodingFormat unsaferow 4.0.0
spark.sql.streaming.stateStore.stateSchemaCheck true 3.1.0
spark.sql.streaming.stopActiveRunOnRestart true 3.0.0
spark.sql.streaming.stopTimeout 0 3.0.0
spark.sql.streaming.transformWithState.stateSchemaVersion 3 4.0.0
spark.sql.thriftServer.interruptOnCancel (value of spark.sql.execution.interruptOnCancel) 3.2.0
spark.sql.thriftServer.queryTimeout 0ms 3.1.0
spark.sql.thriftserver.scheduler.pool (none) 1.1.1
spark.sql.thriftserver.ui.retainedSessions 200 1.4.0
spark.sql.thriftserver.ui.retainedStatements 200 1.4.0
spark.sql.timeTravelTimestampKey timestampAsOf 4.0.0
spark.sql.timeTravelVersionKey versionAsOf 4.0.0
spark.sql.timestampType TIMESTAMP_LTZ 3.4.0
spark.sql.transposeMaxValues 500 4.0.0
spark.sql.tvf.allowMultipleTableArguments.enabled false 3.5.0
spark.sql.ui.explainMode formatted 3.1.0
spark.sql.variable.substitute true 2.0.0

Static SQL Configuration

Property NameDefaultSince VersionGluten Status
spark.sql.cache.serializer org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer 3.1.0
spark.sql.catalog.spark_catalog.defaultDatabase default 3.4.0
spark.sql.event.truncate.length 2147483647 3.0.0
spark.sql.extensions (none) 2.2.0
spark.sql.extensions.test.loadFromCp true
spark.sql.hive.metastore.barrierPrefixes 1.4.0
spark.sql.hive.metastore.jars builtin 1.4.0
spark.sql.hive.metastore.jars.path 3.1.0
spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc 1.4.0
spark.sql.hive.metastore.version 2.3.10 1.4.0
spark.sql.hive.thriftServer.singleSession false 1.6.0
spark.sql.hive.version 2.3.10 1.1.1
spark.sql.metadataCacheTTLSeconds -1000ms 3.1.0
spark.sql.queryExecutionListeners (none) 2.3.0
spark.sql.sources.disabledJdbcConnProviderList 3.1.0
spark.sql.streaming.streamingQueryListeners (none) 2.4.0
spark.sql.streaming.ui.enabled true 3.0.0
spark.sql.streaming.ui.retainedProgressUpdates 100 3.0.0
spark.sql.streaming.ui.retainedQueries 100 3.0.0
spark.sql.ui.retainedExecutions 1000 1.5.0
spark.sql.warehouse.dir (value of $PWD/spark-warehouse) 2.0.0

Cluster Managers

These configurations are handled by Spark and do not affect Gluten’s behavior.

Push-based shuffle overview

External Shuffle service(server) side configuration options

Property NameDefaultSince VersionGluten Status
spark.shuffle.push.server.mergedShuffleFileManagerImpl org.apache.spark.network.shuffle.
NoOpMergedShuffleFileManager
3.2.0
spark.shuffle.push.server.minChunkSizeInMergedShuffleFile 2m 3.2.0
spark.shuffle.push.server.mergedIndexCacheSize 100m 3.2.0

Client side configuration options

Property NameDefaultSince VersionGluten Status
spark.shuffle.push.enabled false 3.2.0
spark.shuffle.push.finalize.timeout 10s 3.2.0
spark.shuffle.push.maxRetainedMergerLocations 500 3.2.0
spark.shuffle.push.mergersMinThresholdRatio 0.05 3.2.0
spark.shuffle.push.mergersMinStaticThreshold 5 3.2.0
spark.shuffle.push.numPushThreads (none) 3.2.0
spark.shuffle.push.maxBlockSizeToPush 1m 3.2.0
spark.shuffle.push.maxBlockBatchSize 3m 3.2.0
spark.shuffle.push.merge.finalizeThreads 8 3.3.0
spark.shuffle.push.minShuffleSizeToWait 500m 3.3.0
spark.shuffle.push.minCompletedPushRatio 1.0 3.3.0