[SPARK-53926][DOCS] Document newly added core module configurations

dongjoon-hyun · dongjoon-hyun · commit 032dcf89c19b · 2025-10-15T19:02:34.000-07:00
### What changes were proposed in this pull request? This PR aims to document newly added `core` module configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To help the users use new features easily. - apache#47856 - apache#51130 - apache#51163 - apache#51604 - apache#51630 - apache#51708 - apache#51885 - apache#52091 - apache#52382 ### Does this PR introduce _any_ user-facing change? No behavior change because this is a documentation update. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52626 from dongjoon-hyun/SPARK-53926. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -523,6 +523,16 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.driver.log.redirectConsoleOutputs</code></td>
+  <td>stdout,stderr</td>
+  <td>
+    Comma-separated list of the console output kind for driver that needs to redirect
+    to logging system. Supported values are `stdout`, `stderr`. It only takes affect when
+    `spark.plugins` is configured with `org.apache.spark.deploy.RedirectConsolePlugin`.
+  </td>
+  <td>4.1.0</td>
+</tr>
 <tr>
   <td><code>spark.decommission.enabled</code></td>
   <td>false</td>
@@ -772,6 +782,16 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.1.0</td>
 </tr>
+<tr>
+  <td><code>spark.executor.logs.redirectConsoleOutputs</code></td>
+  <td>stdout,stderr</td>
+  <td>
+    Comma-separated list of the console output kind for executor that needs to redirect
+    to logging system. Supported values are `stdout`, `stderr`. It only takes affect when
+    `spark.plugins` is configured with `org.apache.spark.deploy.RedirectConsolePlugin`.
+  </td>
+  <td>4.1.0</td>
+</tr>
 <tr>
   <td><code>spark.executor.userClassPathFirst</code></td>
   <td>false</td>
@@ -857,6 +877,47 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.2.0</td>
 </tr>
+<tr>
+  <td><code>spark.python.factory.idleWorkerMaxPoolSize</code></td>
+  <td>(none)</td>
+  <td>
+    Maximum number of idle Python workers to keep. If unset, the number is unbounded.
+    If set to a positive integer N, at most N idle workers are retained;
+    least-recently used workers are evicted first.
+  </td>
+  <td>4.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.python.worker.killOnIdleTimeout</code></td>
+  <td>false</td>
+  <td>
+    Whether Spark should terminate the Python worker process when the idle timeout
+    (as defined by <code>spark.python.worker.idleTimeoutSeconds</code>) is reached. If enabled,
+    Spark will terminate the Python worker process in addition to logging the status.
+  </td>
+  <td>4.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.python.worker.tracebackDumpIntervalSeconds</code></td>
+  <td>0</td>
+  <td>
+    The interval (in seconds) for Python workers to dump their tracebacks.
+    If it's positive, the Python worker will periodically dump the traceback into
+    its `stderr`. The default is `0` that means it is disabled.
+  </td>
+  <td>4.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.python.unix.domain.socket.enabled</code></td>
+  <td>false</td>
+  <td>
+    When set to true, the Python driver uses a Unix domain socket for operations like
+    creating or collecting a DataFrame from local data, using accumulators, and executing
+    Python functions with PySpark such as Python UDFs. This configuration only applies
+    to Spark Classic and Spark Connect server.
+  </td>
+  <td>4.1.0</td>
+</tr>
 <tr>
   <td><code>spark.files</code></td>
   <td></td>
@@ -873,6 +934,16 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.0.1</td>
 </tr>
+<tr>
+  <td><code>spark.submit.callSystemExitOnMainExit</code></td>
+  <td>false</td>
+  <td>
+    If true, SparkSubmit will call System.exit() to initiate JVM shutdown once the
+    user's main method has exited. This can be useful in cases where non-daemon JVM
+    threads might otherwise prevent the JVM from shutting down on its own.
+  </td>
+  <td>4.1.0</td>
+</tr>
 <tr>
   <td><code>spark.jars</code></td>
   <td></td>
@@ -1431,6 +1502,14 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.eventLog.excludedPatterns</code></td>
+  <td>(none)</td>
+  <td>
+    Specifies comma-separated event names to be excluded from the event logs.
+  </td>
+  <td>4.1.0</td>
+</tr>
 <tr>
   <td><code>spark.eventLog.dir</code></td>
   <td>file:///tmp/spark-events</td>
@@ -1905,6 +1984,15 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>3.2.0</td>
 </tr>
+<tr>
+  <td><code>spark.io.compression.zstd.strategy</code></td>
+  <td>(none)</td>
+  <td>
+    Compression strategy for Zstd compression codec. The higher the value is, the more
+    complex it becomes, usually resulting stronger but slower compression or higher CPU cost.
+  </td>
+  <td>4.1.0</td>
+</tr>
 <tr>
   <td><code>spark.io.compression.zstd.workers</code></td>
   <td>0</td>
@@ -2092,6 +2180,17 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.6.0</td>
 </tr>
+<tr>
+  <td><code>spark.memory.unmanagedMemoryPollingInterval</code></td>
+  <td>0s</td>
+  <td>
+    Interval for polling unmanaged memory users to track their memory usage.
+    Unmanaged memory users are components that manage their own memory outside of
+    Spark's core memory management, such as RocksDB for Streaming State Store.
+    Setting this to 0 disables unmanaged memory polling.
+  </td>
+  <td>4.1.0</td>
+</tr>
 <tr>
   <td><code>spark.storage.unrollMemoryThreshold</code></td>
   <td>1024 * 1024</td>
@@ -2543,6 +2642,16 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>0.7.0</td>
 </tr>
+<tr>
+  <td><code>spark.driver.metrics.pollingInterval</code></td>
+  <td>10s</td>
+  <td>
+    How often to collect driver metrics (in milliseconds).
+    If unset, the polling is done at the executor heartbeat interval.
+    If set, the polling is done at this interval.
+  </td>
+  <td>4.1.0</td>
+</tr>
 <tr>
   <td><code>spark.rpc.io.backLog</code></td>
   <td>64</td>
diff --git a/docs/monitoring.md b/docs/monitoring.md
@@ -401,6 +401,14 @@ Security options for the Spark History Server are covered more detail in the
     </td>
     <td>3.0.0</td>
   </tr>
+  <tr>
+    <td>spark.history.fs.eventLog.rolling.onDemandLoadEnabled</td>
+    <td>true</td>
+    <td>
+      Whether to look up rolling event log locations on demand manner before listing files.
+    </td>
+    <td>4.1.0</td>
+  </tr>
   <tr>
     <td>spark.history.store.hybridStore.enabled</td>
     <td>false</td>