Skip to content

Commit 4102fb8

Browse files
authored
feat: Change default off-heap memory pool from greedy_unified to fair_unified (#2526)
1 parent 8d4cb86 commit 4102fb8

File tree

5 files changed

+18
-15
lines changed

5 files changed

+18
-15
lines changed

common/src/main/scala/org/apache/comet/CometConf.scala

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -509,12 +509,13 @@ object CometConf extends ShimCometConf {
509509
.createWithDefault(false)
510510

511511
val COMET_EXEC_MEMORY_POOL_TYPE: ConfigEntry[String] = conf("spark.comet.exec.memoryPool")
512-
.doc("The type of memory pool to be used for Comet native execution. " +
513-
"When running Spark in on-heap mode, available pool types are 'greedy', 'fair_spill', " +
514-
"'greedy_task_shared', 'fair_spill_task_shared', 'greedy_global', 'fair_spill_global', " +
515-
"and `unbounded`. When running Spark in off-heap mode, available pool types are " +
516-
"'unified' and `fair_unified`. The default pool type is `greedy_task_shared` for on-heap " +
517-
s"mode and `unified` for off-heap mode. $TUNING_GUIDE.")
512+
.doc(
513+
"The type of memory pool to be used for Comet native execution. " +
514+
"When running Spark in on-heap mode, available pool types are 'greedy', 'fair_spill', " +
515+
"'greedy_task_shared', 'fair_spill_task_shared', 'greedy_global', 'fair_spill_global', " +
516+
"and `unbounded`. When running Spark in off-heap mode, available pool types are " +
517+
"'greedy_unified' and `fair_unified`. The default pool type is `greedy_task_shared` " +
518+
s"for on-heap mode and `unified` for off-heap mode. $TUNING_GUIDE.")
518519
.stringConf
519520
.createWithDefault("default")
520521

docs/source/user-guide/latest/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Comet provides the following configuration settings.
4949
| spark.comet.exec.globalLimit.enabled | Whether to enable globalLimit by default. | true |
5050
| spark.comet.exec.hashJoin.enabled | Whether to enable hashJoin by default. | true |
5151
| spark.comet.exec.localLimit.enabled | Whether to enable localLimit by default. | true |
52-
| spark.comet.exec.memoryPool | The type of memory pool to be used for Comet native execution. When running Spark in on-heap mode, available pool types are 'greedy', 'fair_spill', 'greedy_task_shared', 'fair_spill_task_shared', 'greedy_global', 'fair_spill_global', and `unbounded`. When running Spark in off-heap mode, available pool types are 'unified' and `fair_unified`. The default pool type is `greedy_task_shared` for on-heap mode and `unified` for off-heap mode. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | default |
52+
| spark.comet.exec.memoryPool | The type of memory pool to be used for Comet native execution. When running Spark in on-heap mode, available pool types are 'greedy', 'fair_spill', 'greedy_task_shared', 'fair_spill_task_shared', 'greedy_global', 'fair_spill_global', and `unbounded`. When running Spark in off-heap mode, available pool types are 'greedy_unified' and `fair_unified`. The default pool type is `greedy_task_shared` for on-heap mode and `unified` for off-heap mode. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | default |
5353
| spark.comet.exec.project.enabled | Whether to enable project by default. | true |
5454
| spark.comet.exec.replaceSortMergeJoin | Experimental feature to force Spark to replace SortMergeJoin with ShuffledHashJoin for improved performance. This feature is not stable yet. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | false |
5555
| spark.comet.exec.shuffle.compression.codec | The codec of Comet native shuffle used to compress shuffle data. lz4, zstd, and snappy are supported. Compression can be disabled by setting spark.shuffle.compress=false. | lz4 |

docs/source/user-guide/latest/tuning.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -116,13 +116,13 @@ Comet implements multiple memory pool implementations. The type of pool can be s
116116

117117
The valid pool types for off-heap mode are:
118118

119-
- `unified` (default when `spark.memory.offHeap.enabled=true` is set)
120-
- `fair_unified`
119+
- `fair_unified` (default when `spark.memory.offHeap.enabled=true` is set)
120+
- `greedy_unified`
121121

122122
Both of these pools share off-heap memory between Spark and Comet. This approach is referred to as
123123
unified memory management. The size of the pool is specified by `spark.memory.offHeap.size`.
124124

125-
The `unified` pool type implements a greedy first-come first-serve limit. This pool works well for queries that do not
125+
The `greedy_unified` pool type implements a greedy first-come first-serve limit. This pool works well for queries that do not
126126
need to spill or have a single spillable operator.
127127

128128
The `fair_unified` pool type prevents operators from using more than an even fraction of the available memory

native/core/src/execution/memory_pools/config.rs

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ use crate::errors::{CometError, CometResult};
1919

2020
#[derive(Copy, Clone, PartialEq, Eq)]
2121
pub(crate) enum MemoryPoolType {
22-
Unified,
22+
GreedyUnified,
2323
FairUnified,
2424
Greedy,
2525
FairSpill,
@@ -62,12 +62,14 @@ pub(crate) fn parse_memory_pool_config(
6262
let pool_size = memory_limit as usize;
6363
let memory_pool_config = if off_heap_mode {
6464
match memory_pool_type.as_str() {
65-
"fair_unified" => MemoryPoolConfig::new(MemoryPoolType::FairUnified, pool_size),
66-
"default" | "unified" => {
65+
"default" | "fair_unified" => {
66+
MemoryPoolConfig::new(MemoryPoolType::FairUnified, pool_size)
67+
}
68+
"greedy_unified" => {
6769
// the `unified` memory pool interacts with Spark's memory pool to allocate
6870
// memory therefore does not need a size to be explicitly set. The pool size
6971
// shared with Spark is set by `spark.memory.offHeap.size`.
70-
MemoryPoolConfig::new(MemoryPoolType::Unified, 0)
72+
MemoryPoolConfig::new(MemoryPoolType::GreedyUnified, 0)
7173
}
7274
_ => {
7375
return Err(CometError::Config(format!(

native/core/src/execution/memory_pools/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ pub(crate) fn create_memory_pool(
4040
) -> Arc<dyn MemoryPool> {
4141
const NUM_TRACKED_CONSUMERS: usize = 10;
4242
match memory_pool_config.pool_type {
43-
MemoryPoolType::Unified => {
43+
MemoryPoolType::GreedyUnified => {
4444
// Set Comet memory pool for native
4545
let memory_pool =
4646
CometUnifiedMemoryPool::new(comet_task_memory_manager, task_attempt_id);

0 commit comments

Comments
 (0)