Skip to content

Commit 032dcf8

Browse files
committed
[SPARK-53926][DOCS] Document newly added core module configurations
### What changes were proposed in this pull request? This PR aims to document newly added `core` module configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To help the users use new features easily. - apache#47856 - apache#51130 - apache#51163 - apache#51604 - apache#51630 - apache#51708 - apache#51885 - apache#52091 - apache#52382 ### Does this PR introduce _any_ user-facing change? No behavior change because this is a documentation update. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52626 from dongjoon-hyun/SPARK-53926. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent bc8020f commit 032dcf8

File tree

2 files changed

+117
-0
lines changed

2 files changed

+117
-0
lines changed

docs/configuration.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,16 @@ of the most common options to set are:
523523
</td>
524524
<td>3.0.0</td>
525525
</tr>
526+
<tr>
527+
<td><code>spark.driver.log.redirectConsoleOutputs</code></td>
528+
<td>stdout,stderr</td>
529+
<td>
530+
Comma-separated list of the console output kind for driver that needs to redirect
531+
to logging system. Supported values are `stdout`, `stderr`. It only takes affect when
532+
`spark.plugins` is configured with `org.apache.spark.deploy.RedirectConsolePlugin`.
533+
</td>
534+
<td>4.1.0</td>
535+
</tr>
526536
<tr>
527537
<td><code>spark.decommission.enabled</code></td>
528538
<td>false</td>
@@ -772,6 +782,16 @@ Apart from these, the following properties are also available, and may be useful
772782
</td>
773783
<td>1.1.0</td>
774784
</tr>
785+
<tr>
786+
<td><code>spark.executor.logs.redirectConsoleOutputs</code></td>
787+
<td>stdout,stderr</td>
788+
<td>
789+
Comma-separated list of the console output kind for executor that needs to redirect
790+
to logging system. Supported values are `stdout`, `stderr`. It only takes affect when
791+
`spark.plugins` is configured with `org.apache.spark.deploy.RedirectConsolePlugin`.
792+
</td>
793+
<td>4.1.0</td>
794+
</tr>
775795
<tr>
776796
<td><code>spark.executor.userClassPathFirst</code></td>
777797
<td>false</td>
@@ -857,6 +877,47 @@ Apart from these, the following properties are also available, and may be useful
857877
</td>
858878
<td>1.2.0</td>
859879
</tr>
880+
<tr>
881+
<td><code>spark.python.factory.idleWorkerMaxPoolSize</code></td>
882+
<td>(none)</td>
883+
<td>
884+
Maximum number of idle Python workers to keep. If unset, the number is unbounded.
885+
If set to a positive integer N, at most N idle workers are retained;
886+
least-recently used workers are evicted first.
887+
</td>
888+
<td>4.1.0</td>
889+
</tr>
890+
<tr>
891+
<td><code>spark.python.worker.killOnIdleTimeout</code></td>
892+
<td>false</td>
893+
<td>
894+
Whether Spark should terminate the Python worker process when the idle timeout
895+
(as defined by <code>spark.python.worker.idleTimeoutSeconds</code>) is reached. If enabled,
896+
Spark will terminate the Python worker process in addition to logging the status.
897+
</td>
898+
<td>4.1.0</td>
899+
</tr>
900+
<tr>
901+
<td><code>spark.python.worker.tracebackDumpIntervalSeconds</code></td>
902+
<td>0</td>
903+
<td>
904+
The interval (in seconds) for Python workers to dump their tracebacks.
905+
If it's positive, the Python worker will periodically dump the traceback into
906+
its `stderr`. The default is `0` that means it is disabled.
907+
</td>
908+
<td>4.1.0</td>
909+
</tr>
910+
<tr>
911+
<td><code>spark.python.unix.domain.socket.enabled</code></td>
912+
<td>false</td>
913+
<td>
914+
When set to true, the Python driver uses a Unix domain socket for operations like
915+
creating or collecting a DataFrame from local data, using accumulators, and executing
916+
Python functions with PySpark such as Python UDFs. This configuration only applies
917+
to Spark Classic and Spark Connect server.
918+
</td>
919+
<td>4.1.0</td>
920+
</tr>
860921
<tr>
861922
<td><code>spark.files</code></td>
862923
<td></td>
@@ -873,6 +934,16 @@ Apart from these, the following properties are also available, and may be useful
873934
</td>
874935
<td>1.0.1</td>
875936
</tr>
937+
<tr>
938+
<td><code>spark.submit.callSystemExitOnMainExit</code></td>
939+
<td>false</td>
940+
<td>
941+
If true, SparkSubmit will call System.exit() to initiate JVM shutdown once the
942+
user's main method has exited. This can be useful in cases where non-daemon JVM
943+
threads might otherwise prevent the JVM from shutting down on its own.
944+
</td>
945+
<td>4.1.0</td>
946+
</tr>
876947
<tr>
877948
<td><code>spark.jars</code></td>
878949
<td></td>
@@ -1431,6 +1502,14 @@ Apart from these, the following properties are also available, and may be useful
14311502
</td>
14321503
<td>3.0.0</td>
14331504
</tr>
1505+
<tr>
1506+
<td><code>spark.eventLog.excludedPatterns</code></td>
1507+
<td>(none)</td>
1508+
<td>
1509+
Specifies comma-separated event names to be excluded from the event logs.
1510+
</td>
1511+
<td>4.1.0</td>
1512+
</tr>
14341513
<tr>
14351514
<td><code>spark.eventLog.dir</code></td>
14361515
<td>file:///tmp/spark-events</td>
@@ -1905,6 +1984,15 @@ Apart from these, the following properties are also available, and may be useful
19051984
</td>
19061985
<td>3.2.0</td>
19071986
</tr>
1987+
<tr>
1988+
<td><code>spark.io.compression.zstd.strategy</code></td>
1989+
<td>(none)</td>
1990+
<td>
1991+
Compression strategy for Zstd compression codec. The higher the value is, the more
1992+
complex it becomes, usually resulting stronger but slower compression or higher CPU cost.
1993+
</td>
1994+
<td>4.1.0</td>
1995+
</tr>
19081996
<tr>
19091997
<td><code>spark.io.compression.zstd.workers</code></td>
19101998
<td>0</td>
@@ -2092,6 +2180,17 @@ Apart from these, the following properties are also available, and may be useful
20922180
</td>
20932181
<td>1.6.0</td>
20942182
</tr>
2183+
<tr>
2184+
<td><code>spark.memory.unmanagedMemoryPollingInterval</code></td>
2185+
<td>0s</td>
2186+
<td>
2187+
Interval for polling unmanaged memory users to track their memory usage.
2188+
Unmanaged memory users are components that manage their own memory outside of
2189+
Spark's core memory management, such as RocksDB for Streaming State Store.
2190+
Setting this to 0 disables unmanaged memory polling.
2191+
</td>
2192+
<td>4.1.0</td>
2193+
</tr>
20952194
<tr>
20962195
<td><code>spark.storage.unrollMemoryThreshold</code></td>
20972196
<td>1024 * 1024</td>
@@ -2543,6 +2642,16 @@ Apart from these, the following properties are also available, and may be useful
25432642
</td>
25442643
<td>0.7.0</td>
25452644
</tr>
2645+
<tr>
2646+
<td><code>spark.driver.metrics.pollingInterval</code></td>
2647+
<td>10s</td>
2648+
<td>
2649+
How often to collect driver metrics (in milliseconds).
2650+
If unset, the polling is done at the executor heartbeat interval.
2651+
If set, the polling is done at this interval.
2652+
</td>
2653+
<td>4.1.0</td>
2654+
</tr>
25462655
<tr>
25472656
<td><code>spark.rpc.io.backLog</code></td>
25482657
<td>64</td>

docs/monitoring.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,14 @@ Security options for the Spark History Server are covered more detail in the
401401
</td>
402402
<td>3.0.0</td>
403403
</tr>
404+
<tr>
405+
<td>spark.history.fs.eventLog.rolling.onDemandLoadEnabled</td>
406+
<td>true</td>
407+
<td>
408+
Whether to look up rolling event log locations on demand manner before listing files.
409+
</td>
410+
<td>4.1.0</td>
411+
</tr>
404412
<tr>
405413
<td>spark.history.store.hybridStore.enabled</td>
406414
<td>false</td>

0 commit comments

Comments
 (0)