You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user-guide/latest/compatibility.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -89,6 +89,7 @@ The following cast operations are generally compatible with Spark except for the
89
89
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
90
90
91
91
<!--BEGIN:COMPAT_CAST_TABLE-->
92
+
<!-- prettier-ignore-start -->
92
93
| From Type | To Type | Notes |
93
94
|-|-|-|
94
95
| boolean | byte ||
@@ -165,6 +166,7 @@ The following cast operations are generally compatible with Spark except for the
165
166
| timestamp | long ||
166
167
| timestamp | string ||
167
168
| timestamp | date ||
169
+
<!-- prettier-ignore-end -->
168
170
<!--END:COMPAT_CAST_TABLE-->
169
171
170
172
### Incompatible Casts
@@ -174,6 +176,7 @@ The following cast operations are not compatible with Spark for all inputs and a
174
176
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
175
177
176
178
<!--BEGIN:INCOMPAT_CAST_TABLE-->
179
+
<!-- prettier-ignore-start -->
177
180
| From Type | To Type | Notes |
178
181
|-|-|-|
179
182
| float | decimal | There can be rounding differences |
@@ -182,6 +185,7 @@ The following cast operations are not compatible with Spark for all inputs and a
182
185
| string | double | Does not support inputs ending with 'd' or 'f'. Does not support 'inf'. Does not support ANSI mode. |
183
186
| string | decimal | Does not support inputs ending with 'd' or 'f'. Does not support 'inf'. Does not support ANSI mode. Returns 0.0 instead of null if input contains no digits |
184
187
| string | timestamp | Not all valid formats are supported |
Copy file name to clipboardExpand all lines: docs/source/user-guide/latest/configs.md
+20Lines changed: 20 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,19 +25,22 @@ Comet provides the following configuration settings.
25
25
26
26
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
27
27
<!--BEGIN:CONFIG_TABLE[scan]-->
28
+
<!-- prettier-ignore-start -->
28
29
| Config | Description | Default Value |
29
30
|--------|-------------|---------------|
30
31
|`spark.comet.scan.allowIncompatible`| Some Comet scan implementations are not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the [Comet Compatibility Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
31
32
|`spark.comet.scan.enabled`| Whether to enable native scans. When this is turned on, Spark will use Comet to read supported data sources (currently only Parquet is supported natively). Note that to enable native vectorized execution, both this config and `spark.comet.exec.enabled` need to be enabled. | true |
32
33
|`spark.comet.scan.preFetch.enabled`| Whether to enable pre-fetching feature of CometScan. | false |
33
34
|`spark.comet.scan.preFetch.threadNum`| The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
34
35
|`spark.hadoop.fs.comet.libhdfs.schemes`| Defines filesystem schemes (e.g., hdfs, webhdfs) that the native side accesses via libhdfs, separated by commas. Valid only when built with hdfs feature enabled. ||
36
+
<!-- prettier-ignore-end -->
35
37
<!--END:CONFIG_TABLE-->
36
38
37
39
## Parquet Reader Configuration Settings
38
40
39
41
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
40
42
<!--BEGIN:CONFIG_TABLE[parquet]-->
43
+
<!-- prettier-ignore-start -->
41
44
| Config | Description | Default Value |
42
45
|--------|-------------|---------------|
43
46
|`spark.comet.parquet.enable.directBuffer`| Whether to use Java direct byte buffer when reading Parquet. | false |
@@ -47,12 +50,14 @@ Comet provides the following configuration settings.
47
50
|`spark.comet.parquet.read.parallel.io.enabled`| Whether to enable Comet's parallel reader for Parquet files. The parallel reader reads ranges of consecutive data in a file in parallel. It is faster for large files and row groups but uses more resources. | true |
48
51
|`spark.comet.parquet.read.parallel.io.thread-pool.size`| The maximum number of parallel threads the parallel reader will use in a single executor. For executors configured with a smaller number of cores, use a smaller number. | 16 |
49
52
|`spark.comet.parquet.respectFilterPushdown`| Whether to respect Spark's PARQUET_FILTER_PUSHDOWN_ENABLED config. This needs to be respected when running the Spark SQL test suite but the default setting results in poor performance in Comet when using the new native scans, disabled by default | false |
53
+
<!-- prettier-ignore-end -->
50
54
<!--END:CONFIG_TABLE-->
51
55
52
56
## Query Execution Settings
53
57
54
58
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
55
59
<!--BEGIN:CONFIG_TABLE[exec]-->
60
+
<!-- prettier-ignore-start -->
56
61
| Config | Description | Default Value |
57
62
|--------|-------------|---------------|
58
63
|`spark.comet.caseConversion.enabled`| Java uses locale-specific rules when converting strings to upper or lower case and Rust does not, so we disable upper and lower by default. | false |
@@ -67,6 +72,7 @@ Comet provides the following configuration settings.
67
72
|`spark.comet.metrics.updateInterval`| The interval in milliseconds to update metrics. If interval is negative, metrics will be updated upon task completion. | 3000 |
68
73
|`spark.comet.nativeLoadRequired`| Whether to require Comet native library to load successfully when Comet is enabled. If not, Comet will silently fallback to Spark when it fails to load the native lib. Otherwise, an error will be thrown and the Spark job will be aborted. | false |
69
74
|`spark.comet.regexp.allowIncompatible`| Comet is not currently fully compatible with Spark for all regular expressions. Set this config to true to allow them anyway. For more information, refer to the [Comet Compatibility Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
75
+
<!-- prettier-ignore-end -->
70
76
<!--END:CONFIG_TABLE-->
71
77
72
78
## Viewing Explain Plan & Fallback Reasons
@@ -75,19 +81,22 @@ These settings can be used to determine which parts of the plan are accelerated
75
81
76
82
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
77
83
<!--BEGIN:CONFIG_TABLE[exec_explain]-->
84
+
<!-- prettier-ignore-start -->
78
85
| Config | Description | Default Value |
79
86
|--------|-------------|---------------|
80
87
|`spark.comet.explain.format`| Choose extended explain output. The default format of 'verbose' will provide the full query plan annotated with fallback reasons as well as a summary of how much of the plan was accelerated by Comet. The format 'fallback' provides a list of fallback reasons instead. | verbose |
81
88
|`spark.comet.explain.native.enabled`| When this setting is enabled, Comet will provide a tree representation of the native query plan before execution and again after execution, with metrics. | false |
82
89
|`spark.comet.explain.rules`| When this setting is enabled, Comet will log all plan transformations performed in physical optimizer rules. Default: false | false |
83
90
|`spark.comet.explainFallback.enabled`| When this setting is enabled, Comet will provide logging explaining the reason(s) why a query stage cannot be executed natively. Set this to false to reduce the amount of logging. | false |
84
91
|`spark.comet.logFallbackReasons.enabled`| When this setting is enabled, Comet will log warnings for all fallback reasons. It can be overridden by the environment variable `ENABLE_COMET_LOG_FALLBACK_REASONS`. | false |
92
+
<!-- prettier-ignore-end -->
85
93
<!--END:CONFIG_TABLE-->
86
94
87
95
## Shuffle Configuration Settings
88
96
89
97
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
90
98
<!--BEGIN:CONFIG_TABLE[shuffle]-->
99
+
<!-- prettier-ignore-start -->
91
100
| Config | Description | Default Value |
92
101
|--------|-------------|---------------|
93
102
|`spark.comet.columnar.shuffle.async.enabled`| Whether to enable asynchronous shuffle for Arrow-based shuffle. | false |
@@ -101,24 +110,28 @@ These settings can be used to determine which parts of the plan are accelerated
101
110
|`spark.comet.native.shuffle.partitioning.range.enabled`| Whether to enable range partitioning for Comet native shuffle. | true |
102
111
|`spark.comet.shuffle.preferDictionary.ratio`| The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
103
112
|`spark.comet.shuffle.sizeInBytesMultiplier`| Comet reports smaller sizes for shuffle due to using Arrow's columnar memory format and this can result in Spark choosing a different join strategy due to the estimated size of the exchange being smaller. Comet will multiple sizeInBytes by this amount to avoid regressions in join strategy. | 1.0 |
113
+
<!-- prettier-ignore-end -->
104
114
<!--END:CONFIG_TABLE-->
105
115
106
116
## Memory & Tuning Configuration Settings
107
117
108
118
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
109
119
<!--BEGIN:CONFIG_TABLE[tuning]-->
120
+
<!-- prettier-ignore-start -->
110
121
| Config | Description | Default Value |
111
122
|--------|-------------|---------------|
112
123
|`spark.comet.batchSize`| The columnar batch size, i.e., the maximum number of rows that a batch can contain. | 8192 |
113
124
|`spark.comet.exec.memoryPool`| The type of memory pool to be used for Comet native execution when running Spark in off-heap mode. Available pool types are `greedy_unified` and `fair_unified`. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | fair_unified |
114
125
|`spark.comet.exec.memoryPool.fraction`| Fraction of off-heap memory pool that is available to Comet. Only applies to off-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
115
126
|`spark.comet.tracing.enabled`| Enable fine-grained tracing of events and memory usage. For more information, refer to the [Comet Tracing Guide](https://datafusion.apache.org/comet/user-guide/tracing.html). | false |
127
+
<!-- prettier-ignore-end -->
116
128
<!--END:CONFIG_TABLE-->
117
129
118
130
## Development & Testing Settings
119
131
120
132
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
121
133
<!--BEGIN:CONFIG_TABLE[testing]-->
134
+
<!-- prettier-ignore-start -->
122
135
| Config | Description | Default Value |
123
136
|--------|-------------|---------------|
124
137
|`spark.comet.columnar.shuffle.memory.factor`| Fraction of Comet memory to be allocated per executor process for columnar shuffle when running in on-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
@@ -131,12 +144,14 @@ These settings can be used to determine which parts of the plan are accelerated
131
144
|`spark.comet.sparkToColumnar.enabled`| Whether to enable Spark to Arrow columnar conversion. When this is turned on, Comet will convert operators in `spark.comet.sparkToColumnar.supportedOperatorList` into Arrow columnar format before processing. This is an experimental feature and has known issues with non-UTC timezones. | false |
132
145
|`spark.comet.sparkToColumnar.supportedOperatorList`| A comma-separated list of operators that will be converted to Arrow columnar format when `spark.comet.sparkToColumnar.enabled` is true. | Range,InMemoryTableScan,RDDScan |
133
146
|`spark.comet.testing.strict`| Experimental option to enable strict testing, which will fail tests that could be more comprehensive, such as checking for a specific fallback reason. It can be overridden by the environment variable `ENABLE_COMET_STRICT_TESTING`. | false |
147
+
<!-- prettier-ignore-end -->
134
148
<!--END:CONFIG_TABLE-->
135
149
136
150
## Enabling or Disabling Individual Operators
137
151
138
152
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
139
153
<!--BEGIN:CONFIG_TABLE[enable_exec]-->
154
+
<!-- prettier-ignore-start -->
140
155
| Config | Description | Default Value |
141
156
|--------|-------------|---------------|
142
157
|`spark.comet.exec.aggregate.enabled`| Whether to enable aggregate by default. | true |
@@ -157,12 +172,14 @@ These settings can be used to determine which parts of the plan are accelerated
157
172
|`spark.comet.exec.takeOrderedAndProject.enabled`| Whether to enable takeOrderedAndProject by default. | true |
158
173
|`spark.comet.exec.union.enabled`| Whether to enable union by default. | true |
159
174
|`spark.comet.exec.window.enabled`| Whether to enable window by default. | true |
175
+
<!-- prettier-ignore-end -->
160
176
<!--END:CONFIG_TABLE-->
161
177
162
178
## Enabling or Disabling Individual Scalar Expressions
163
179
164
180
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
165
181
<!--BEGIN:CONFIG_TABLE[enable_expr]-->
182
+
<!-- prettier-ignore-start -->
166
183
| Config | Description | Default Value |
167
184
|--------|-------------|---------------|
168
185
|`spark.comet.expression.Abs.enabled`| Enable Comet acceleration for `Abs`| true |
@@ -308,12 +325,14 @@ These settings can be used to determine which parts of the plan are accelerated
308
325
|`spark.comet.expression.WeekOfYear.enabled`| Enable Comet acceleration for `WeekOfYear`| true |
309
326
|`spark.comet.expression.XxHash64.enabled`| Enable Comet acceleration for `XxHash64`| true |
310
327
|`spark.comet.expression.Year.enabled`| Enable Comet acceleration for `Year`| true |
328
+
<!-- prettier-ignore-end -->
311
329
<!--END:CONFIG_TABLE-->
312
330
313
331
## Enabling or Disabling Individual Aggregate Expressions
314
332
315
333
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
316
334
<!--BEGIN:CONFIG_TABLE[enable_agg_expr]-->
335
+
<!-- prettier-ignore-start -->
317
336
| Config | Description | Default Value |
318
337
|--------|-------------|---------------|
319
338
|`spark.comet.expression.Average.enabled`| Enable Comet acceleration for `Average`| true |
@@ -334,4 +353,5 @@ These settings can be used to determine which parts of the plan are accelerated
334
353
|`spark.comet.expression.Sum.enabled`| Enable Comet acceleration for `Sum`| true |
335
354
|`spark.comet.expression.VariancePop.enabled`| Enable Comet acceleration for `VariancePop`| true |
336
355
|`spark.comet.expression.VarianceSamp.enabled`| Enable Comet acceleration for `VarianceSamp`| true |
0 commit comments