Skip to content

Commit cc7527f

Browse files
committed
docs
1 parent 8131b4e commit cc7527f

File tree

2 files changed

+27
-12
lines changed

2 files changed

+27
-12
lines changed

common/src/main/scala/org/apache/comet/CometConf.scala

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -514,8 +514,7 @@ object CometConf extends ShimCometConf {
514514
conf("spark.comet.exec.onheap.enabled")
515515
.doc("Whether to allow Comet to run in on-heap mode. Required for running Spark SQL tests.")
516516
.booleanConf
517-
.createWithDefault(
518-
sys.env.getOrElse("ENABLE_COMET_ONHEAP", "false").toBoolean)
517+
.createWithDefault(sys.env.getOrElse("ENABLE_COMET_ONHEAP", "false").toBoolean)
519518

520519
val COMET_EXEC_MEMORY_POOL_TYPE: ConfigEntry[String] = conf("spark.comet.exec.memoryPool")
521520
.doc(

docs/source/user-guide/latest/tuning.md

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,19 @@ and requiring shuffle memory to be separately configured.
4545
The recommended way to allocate memory for Comet is to set `spark.memory.offHeap.enabled=true`. This allows
4646
Comet to share an off-heap memory pool with Spark, reducing the overall memory overhead. The size of the pool is
4747
specified by `spark.memory.offHeap.size`. For more details about Spark off-heap memory mode, please refer to
48-
Spark documentation: https://spark.apache.org/docs/latest/configuration.html.
48+
[Spark documentation]. For full details on configuring Comet memory in off-heap mode, see the [Advanced Memory Tuning]
49+
section of this guide.
50+
51+
[Spark documentation]: https://spark.apache.org/docs/latest/configuration.html
4952

5053
### Configuring Comet Memory in On-Heap Mode
5154

55+
```{warning}
56+
Support for on-heap memory pools is deprecated and will be removed from a future release.
57+
```
58+
59+
Comet is disabled by default in on-heap mode, but can be enabled by setting `spark.comet.exec.onheap.enabled=true`.
60+
5261
When running in on-heap mode, Comet memory can be allocated by setting `spark.comet.memoryOverhead`. If this setting
5362
is not provided, it will be calculated by multiplying the current Spark executor memory by
5463
`spark.comet.memory.overhead.factor` (default value is `0.2`) which may or may not result in enough memory for
@@ -59,10 +68,13 @@ Comet supports native shuffle and columnar shuffle (these terms are explained in
5968
In on-heap mode, columnar shuffle memory must be separately allocated using `spark.comet.columnar.shuffle.memorySize`.
6069
If this setting is not provided, it will be calculated by multiplying `spark.comet.memoryOverhead` by
6170
`spark.comet.columnar.shuffle.memory.factor` (default value is `1.0`). If a shuffle exceeds this amount of memory
62-
then the query will fail.
71+
then the query will fail. For full details on configuring Comet memory in on-heap mode, see the [Advanced Memory Tuning]
72+
section of this guide.
6373

6474
[shuffle]: #shuffle
6575

76+
[Advanced Memory Tuning]: #advanced-memory-tuning
77+
6678
### Determining How Much Memory to Allocate
6779

6880
Generally, increasing the amount of memory allocated to Comet will improve query performance by reducing the
@@ -102,14 +114,6 @@ Workarounds for this problem include:
102114

103115
## Advanced Memory Tuning
104116

105-
### Configuring spark.executor.memoryOverhead in On-Heap Mode
106-
107-
In some environments, such as Kubernetes and YARN, it is important to correctly set `spark.executor.memoryOverhead` so
108-
that it is possible to allocate off-heap memory when running in on-heap mode.
109-
110-
Comet will automatically set `spark.executor.memoryOverhead` based on the `spark.comet.memory*` settings so that
111-
resource managers respect Apache Spark memory configuration before starting the containers.
112-
113117
### Configuring Off-Heap Memory Pools
114118

115119
Comet implements multiple memory pool implementations. The type of pool can be specified with `spark.comet.exec.memoryPool`.
@@ -132,6 +136,10 @@ when there is sufficient memory in order to leave enough memory for other operat
132136

133137
### Configuring On-Heap Memory Pools
134138

139+
```{warning}
140+
Support for on-heap memory pools is deprecated and will be removed from a future release.
141+
```
142+
135143
When running in on-heap mode, Comet will use its own dedicated memory pools that are not shared with Spark.
136144

137145
The type of pool can be specified with `spark.comet.exec.memoryPool`. The default setting is `greedy_task_shared`.
@@ -172,6 +180,14 @@ adjusting how much memory to allocate.
172180
[FairSpillPool]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.FairSpillPool.html
173181
[UnboundedMemoryPool]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.UnboundedMemoryPool.html
174182

183+
### Configuring spark.executor.memoryOverhead in On-Heap Mode
184+
185+
In some environments, such as Kubernetes and YARN, it is important to correctly set `spark.executor.memoryOverhead` so
186+
that it is possible to allocate off-heap memory when running in on-heap mode.
187+
188+
Comet will automatically set `spark.executor.memoryOverhead` based on the `spark.comet.memory*` settings so that
189+
resource managers respect Apache Spark memory configuration before starting the containers.
190+
175191
## Optimizing Joins
176192

177193
Spark often chooses `SortMergeJoin` over `ShuffledHashJoin` for stability reasons. If the build-side of a

0 commit comments

Comments
 (0)