docs

andygrove · andygrove · commit cc7527f13ea3 · 2025-10-12T10:48:01.000-06:00
diff --git a/common/src/main/scala/org/apache/comet/CometConf.scala b/common/src/main/scala/org/apache/comet/CometConf.scala
@@ -514,8 +514,7 @@ object CometConf extends ShimCometConf {
     conf("spark.comet.exec.onheap.enabled")
       .doc("Whether to allow Comet to run in on-heap mode. Required for running Spark SQL tests.")
       .booleanConf
-      .createWithDefault(
-        sys.env.getOrElse("ENABLE_COMET_ONHEAP", "false").toBoolean)
+      .createWithDefault(sys.env.getOrElse("ENABLE_COMET_ONHEAP", "false").toBoolean)
 
   val COMET_EXEC_MEMORY_POOL_TYPE: ConfigEntry[String] = conf("spark.comet.exec.memoryPool")
     .doc(
diff --git a/docs/source/user-guide/latest/tuning.md b/docs/source/user-guide/latest/tuning.md
@@ -45,10 +45,19 @@ and requiring shuffle memory to be separately configured.
 The recommended way to allocate memory for Comet is to set `spark.memory.offHeap.enabled=true`. This allows
 Comet to share an off-heap memory pool with Spark, reducing the overall memory overhead. The size of the pool is
 specified by `spark.memory.offHeap.size`. For more details about Spark off-heap memory mode, please refer to
-Spark documentation: https://spark.apache.org/docs/latest/configuration.html.
+[Spark documentation]. For full details on configuring Comet memory in off-heap mode, see the [Advanced Memory Tuning] 
+section of this guide.
+
+[Spark documentation]: https://spark.apache.org/docs/latest/configuration.html
 
 ### Configuring Comet Memory in On-Heap Mode
 
+```{warning}
+Support for on-heap memory pools is deprecated and will be removed from a future release.
+```
+
+Comet is disabled by default in on-heap mode, but can be enabled by setting `spark.comet.exec.onheap.enabled=true`.
+
 When running in on-heap mode, Comet memory can be allocated by setting `spark.comet.memoryOverhead`. If this setting
 is not provided, it will be calculated by multiplying the current Spark executor memory by
 `spark.comet.memory.overhead.factor` (default value is `0.2`) which may or may not result in enough memory for
@@ -59,10 +68,13 @@ Comet supports native shuffle and columnar shuffle (these terms are explained in
 In on-heap mode, columnar shuffle memory must be separately allocated using `spark.comet.columnar.shuffle.memorySize`.
 If this setting is not provided, it will be calculated by multiplying `spark.comet.memoryOverhead` by
 `spark.comet.columnar.shuffle.memory.factor` (default value is `1.0`). If a shuffle exceeds this amount of memory
-then the query will fail.
+then the query will fail. For full details on configuring Comet memory in on-heap mode, see the [Advanced Memory Tuning]
+section of this guide.
 
 [shuffle]: #shuffle
 
+[Advanced Memory Tuning]: #advanced-memory-tuning
+
 ### Determining How Much Memory to Allocate
 
 Generally, increasing the amount of memory allocated to Comet will improve query performance by reducing the
@@ -102,14 +114,6 @@ Workarounds for this problem include:
 
 ## Advanced Memory Tuning
 
-### Configuring spark.executor.memoryOverhead in On-Heap Mode
-
-In some environments, such as Kubernetes and YARN, it is important to correctly set `spark.executor.memoryOverhead` so
-that it is possible to allocate off-heap memory when running in on-heap mode.
-
-Comet will automatically set `spark.executor.memoryOverhead` based on the `spark.comet.memory*` settings so that
-resource managers respect Apache Spark memory configuration before starting the containers.
-
 ### Configuring Off-Heap Memory Pools
 
 Comet implements multiple memory pool implementations. The type of pool can be specified with `spark.comet.exec.memoryPool`.
@@ -132,6 +136,10 @@ when there is sufficient memory in order to leave enough memory for other operat
 
 ### Configuring On-Heap Memory Pools
 
+```{warning}
+Support for on-heap memory pools is deprecated and will be removed from a future release.
+```
+
 When running in on-heap mode, Comet will use its own dedicated memory pools that are not shared with Spark.
 
 The type of pool can be specified with `spark.comet.exec.memoryPool`. The default setting is `greedy_task_shared`.
@@ -172,6 +180,14 @@ adjusting how much memory to allocate.
 [FairSpillPool]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.FairSpillPool.html
 [UnboundedMemoryPool]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.UnboundedMemoryPool.html
 
+### Configuring spark.executor.memoryOverhead in On-Heap Mode
+
+In some environments, such as Kubernetes and YARN, it is important to correctly set `spark.executor.memoryOverhead` so
+that it is possible to allocate off-heap memory when running in on-heap mode.
+
+Comet will automatically set `spark.executor.memoryOverhead` based on the `spark.comet.memory*` settings so that
+resource managers respect Apache Spark memory configuration before starting the containers.
+
 ## Optimizing Joins
 
 Spark often chooses `SortMergeJoin` over `ShuffledHashJoin` for stability reasons. If the build-side of a