Skip to content

Commit 4cfceb7

Browse files
authored
docs: Categorize some configs as testing and add notes about known time zone issues (#2740)
1 parent 4df3b6e commit 4cfceb7

File tree

2 files changed

+18
-18
lines changed

2 files changed

+18
-18
lines changed

common/src/main/scala/org/apache/comet/CometConf.scala

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -181,31 +181,31 @@ object CometConf extends ShimCometConf {
181181

182182
val COMET_CONVERT_FROM_PARQUET_ENABLED: ConfigEntry[Boolean] =
183183
conf("spark.comet.convert.parquet.enabled")
184-
.category(CATEGORY_SCAN)
184+
.category(CATEGORY_TESTING)
185185
.doc(
186186
"When enabled, data from Spark (non-native) Parquet v1 and v2 scans will be converted to " +
187-
"Arrow format. Note that to enable native vectorized execution, both this config and " +
188-
"`spark.comet.exec.enabled` need to be enabled.")
187+
"Arrow format. This is an experimental feature and has known issues with " +
188+
"non-UTC timezones.")
189189
.booleanConf
190190
.createWithDefault(false)
191191

192192
val COMET_CONVERT_FROM_JSON_ENABLED: ConfigEntry[Boolean] =
193193
conf("spark.comet.convert.json.enabled")
194-
.category(CATEGORY_SCAN)
194+
.category(CATEGORY_TESTING)
195195
.doc(
196196
"When enabled, data from Spark (non-native) JSON v1 and v2 scans will be converted to " +
197-
"Arrow format. Note that to enable native vectorized execution, both this config and " +
198-
"`spark.comet.exec.enabled` need to be enabled.")
197+
"Arrow format. This is an experimental feature and has known issues with " +
198+
"non-UTC timezones.")
199199
.booleanConf
200200
.createWithDefault(false)
201201

202202
val COMET_CONVERT_FROM_CSV_ENABLED: ConfigEntry[Boolean] =
203203
conf("spark.comet.convert.csv.enabled")
204-
.category(CATEGORY_SCAN)
204+
.category(CATEGORY_TESTING)
205205
.doc(
206206
"When enabled, data from Spark (non-native) CSV v1 and v2 scans will be converted to " +
207-
"Arrow format. Note that to enable native vectorized execution, both this config and " +
208-
"`spark.comet.exec.enabled` need to be enabled.")
207+
"Arrow format. This is an experimental feature and has known issues with " +
208+
"non-UTC timezones.")
209209
.booleanConf
210210
.createWithDefault(false)
211211

@@ -633,19 +633,19 @@ object CometConf extends ShimCometConf {
633633

634634
val COMET_SPARK_TO_ARROW_ENABLED: ConfigEntry[Boolean] =
635635
conf("spark.comet.sparkToColumnar.enabled")
636-
.category(CATEGORY_SCAN)
636+
.category(CATEGORY_TESTING)
637637
.doc("Whether to enable Spark to Arrow columnar conversion. When this is turned on, " +
638638
"Comet will convert operators in " +
639639
"`spark.comet.sparkToColumnar.supportedOperatorList` into Arrow columnar format before " +
640-
"processing.")
640+
"processing. This is an experimental feature and has known issues with non-UTC timezones.")
641641
.booleanConf
642642
.createWithDefault(false)
643643

644644
val COMET_SPARK_TO_ARROW_SUPPORTED_OPERATOR_LIST: ConfigEntry[Seq[String]] =
645645
conf("spark.comet.sparkToColumnar.supportedOperatorList")
646-
.category(CATEGORY_SCAN)
646+
.category(CATEGORY_TESTING)
647647
.doc("A comma-separated list of operators that will be converted to Arrow columnar " +
648-
"format when `spark.comet.sparkToColumnar.enabled` is true")
648+
s"format when `${COMET_SPARK_TO_ARROW_ENABLED.key}` is true.")
649649
.stringConf
650650
.toSequence
651651
.createWithDefault(Seq("Range,InMemoryTableScan,RDDScan"))

docs/source/user-guide/latest/configs.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,15 +27,10 @@ Comet provides the following configuration settings.
2727
<!--BEGIN:CONFIG_TABLE[scan]-->
2828
| Config | Description | Default Value |
2929
|--------|-------------|---------------|
30-
| `spark.comet.convert.csv.enabled` | When enabled, data from Spark (non-native) CSV v1 and v2 scans will be converted to Arrow format. Note that to enable native vectorized execution, both this config and `spark.comet.exec.enabled` need to be enabled. | false |
31-
| `spark.comet.convert.json.enabled` | When enabled, data from Spark (non-native) JSON v1 and v2 scans will be converted to Arrow format. Note that to enable native vectorized execution, both this config and `spark.comet.exec.enabled` need to be enabled. | false |
32-
| `spark.comet.convert.parquet.enabled` | When enabled, data from Spark (non-native) Parquet v1 and v2 scans will be converted to Arrow format. Note that to enable native vectorized execution, both this config and `spark.comet.exec.enabled` need to be enabled. | false |
3330
| `spark.comet.scan.allowIncompatible` | Some Comet scan implementations are not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the [Comet Compatibility Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
3431
| `spark.comet.scan.enabled` | Whether to enable native scans. When this is turned on, Spark will use Comet to read supported data sources (currently only Parquet is supported natively). Note that to enable native vectorized execution, both this config and `spark.comet.exec.enabled` need to be enabled. | true |
3532
| `spark.comet.scan.preFetch.enabled` | Whether to enable pre-fetching feature of CometScan. | false |
3633
| `spark.comet.scan.preFetch.threadNum` | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
37-
| `spark.comet.sparkToColumnar.enabled` | Whether to enable Spark to Arrow columnar conversion. When this is turned on, Comet will convert operators in `spark.comet.sparkToColumnar.supportedOperatorList` into Arrow columnar format before processing. | false |
38-
| `spark.comet.sparkToColumnar.supportedOperatorList` | A comma-separated list of operators that will be converted to Arrow columnar format when `spark.comet.sparkToColumnar.enabled` is true | Range,InMemoryTableScan,RDDScan |
3934
| `spark.hadoop.fs.comet.libhdfs.schemes` | Defines filesystem schemes (e.g., hdfs, webhdfs) that the native side accesses via libhdfs, separated by commas. Valid only when built with hdfs feature enabled. | |
4035
<!--END:CONFIG_TABLE-->
4136

@@ -127,9 +122,14 @@ These settings can be used to determine which parts of the plan are accelerated
127122
| Config | Description | Default Value |
128123
|--------|-------------|---------------|
129124
| `spark.comet.columnar.shuffle.memory.factor` | Fraction of Comet memory to be allocated per executor process for columnar shuffle when running in on-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
125+
| `spark.comet.convert.csv.enabled` | When enabled, data from Spark (non-native) CSV v1 and v2 scans will be converted to Arrow format. This is an experimental feature and has known issues with non-UTC timezones. | false |
126+
| `spark.comet.convert.json.enabled` | When enabled, data from Spark (non-native) JSON v1 and v2 scans will be converted to Arrow format. This is an experimental feature and has known issues with non-UTC timezones. | false |
127+
| `spark.comet.convert.parquet.enabled` | When enabled, data from Spark (non-native) Parquet v1 and v2 scans will be converted to Arrow format. This is an experimental feature and has known issues with non-UTC timezones. | false |
130128
| `spark.comet.exec.onHeap.enabled` | Whether to allow Comet to run in on-heap mode. Required for running Spark SQL tests. Can be overridden by environment variable `ENABLE_COMET_ONHEAP`. | false |
131129
| `spark.comet.exec.onHeap.memoryPool` | The type of memory pool to be used for Comet native execution when running Spark in on-heap mode. Available pool types are `greedy`, `fair_spill`, `greedy_task_shared`, `fair_spill_task_shared`, `greedy_global`, `fair_spill_global`, and `unbounded`. | greedy_task_shared |
132130
| `spark.comet.memoryOverhead` | The amount of additional memory to be allocated per executor process for Comet, in MiB, when running Spark in on-heap mode. | 1024 MiB |
131+
| `spark.comet.sparkToColumnar.enabled` | Whether to enable Spark to Arrow columnar conversion. When this is turned on, Comet will convert operators in `spark.comet.sparkToColumnar.supportedOperatorList` into Arrow columnar format before processing. This is an experimental feature and has known issues with non-UTC timezones. | false |
132+
| `spark.comet.sparkToColumnar.supportedOperatorList` | A comma-separated list of operators that will be converted to Arrow columnar format when `spark.comet.sparkToColumnar.enabled` is true. | Range,InMemoryTableScan,RDDScan |
133133
| `spark.comet.testing.strict` | Experimental option to enable strict testing, which will fail tests that could be more comprehensive, such as checking for a specific fallback reason. Can be overridden by environment variable `ENABLE_COMET_STRICT_TESTING`. | false |
134134
<!--END:CONFIG_TABLE-->
135135

0 commit comments

Comments
 (0)