You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`native_comet`| This is the default implementation. It provides strong compatibility with Spark but does not support complex types. |
40
+
|`native_datafusion`| This implementation delegates to DataFusion's `ParquetExec`. |
41
+
|`native_iceberg_compat`| This implementation also delegates to DataFusion's `ParquetExec` but uses a hybrid approach of JVM and native code. This scan is designed to be integrated with Iceberg in the future. |
42
+
43
+
The new (and currently experimental) `native_datafusion` and `native_iceberg_compat` scans are being added to
44
+
provide the following benefits over the `native_comet` implementation:
45
+
46
+
- Leverage the DataFusion community's ongoing improvements to `ParquetExec`
47
+
- Provide support for reading complex types (structs, arrays, and maps)
48
+
- Remove the use of reusable mutable-buffers in Comet, which is complex to maintain
49
+
50
+
These new implementations are not fully implemented. Some of the current limitations are:
51
+
52
+
- Scanning Parquet files containing unsigned 8 or 16-bit integers can produce results that don't match Spark. By default, Comet
53
+
will fall back to Spark when using these scan implementations to read Parquet files containing 8 or 16-bit integers.
54
+
This behavior can be disabled by setting `spark.comet.scan.allowIncompatible=true`.
55
+
- These implementations do not yet fully support timestamps, decimals, or complex types.
56
+
26
57
## ANSI mode
27
58
28
59
Comet currently ignores ANSI mode in most cases, and therefore can produce different results than Spark. By default,
Copy file name to clipboardExpand all lines: docs/source/user-guide/configs.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,12 @@ specific language governing permissions and limitations
17
17
under the License.
18
18
-->
19
19
20
+
<!--
21
+
TO MODIFY THIS CONTENT MAKE SURE THAT YOU MAKE YOUR CHANGES TO THE TEMPLATE FILE
22
+
(docs/templates/configs-template.md) AND NOT THE GENERATED FILE
23
+
(docs/source/user-guide/configs.md) OTHERWISE YOUR CHANGES MAY BE LOST
24
+
-->
25
+
20
26
# Comet Configuration Settings
21
27
22
28
Comet provides the following configuration settings.
@@ -76,7 +82,7 @@ Comet provides the following configuration settings.
76
82
| spark.comet.parquet.read.parallel.io.enabled | Whether to enable Comet's parallel reader for Parquet files. The parallel reader reads ranges of consecutive data in a file in parallel. It is faster for large files and row groups but uses more resources. | true |
77
83
| spark.comet.parquet.read.parallel.io.thread-pool.size | The maximum number of parallel threads the parallel reader will use in a single executor. For executors configured with a smaller number of cores, use a smaller number. | 16 |
78
84
| spark.comet.regexp.allowIncompatible | Comet is not currently fully compatible with Spark for all regular expressions. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html).| false |
79
-
| spark.comet.scan.allowIncompatible | Comet is not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html).|true|
85
+
| spark.comet.scan.allowIncompatible | Comet is not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html).|false|
80
86
| spark.comet.scan.enabled | Whether to enable native scans. When this is turned on, Spark will use Comet to read supported data sources (currently only Parquet is supported natively). Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. | true |
81
87
| spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. | false |
82
88
| spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
|`native_comet`| This is the default implementation. It provides strong compatibility with Spark but does not support complex types. |
40
+
|`native_datafusion`| This implementation delegates to DataFusion's `ParquetExec`. |
41
+
|`native_iceberg_compat`| This implementation also delegates to DataFusion's `ParquetExec` but uses a hybrid approach of JVM and native code. This scan is designed to be integrated with Iceberg in the future. |
42
+
43
+
The new (and currently experimental) `native_datafusion` and `native_iceberg_compat` scans are being added to
44
+
provide the following benefits over the `native_comet` implementation:
45
+
46
+
- Leverage the DataFusion community's ongoing improvements to `ParquetExec`
47
+
- Provide support for reading complex types (structs, arrays, and maps)
48
+
- Remove the use of reusable mutable-buffers in Comet, which is complex to maintain
49
+
50
+
These new implementations are not fully implemented. Some of the current limitations are:
51
+
52
+
- Scanning Parquet files containing unsigned 8 or 16-bit integers can produce results that don't match Spark. By default, Comet
53
+
will fall back to Spark when using these scan implementations to read Parquet files containing 8 or 16-bit integers.
54
+
This behavior can be disabled by setting `spark.comet.scan.allowIncompatible=true`.
55
+
- These implementations do not yet fully support timestamps, decimals, or complex types.
56
+
26
57
## ANSI mode
27
58
28
59
Comet currently ignores ANSI mode in most cases, and therefore can produce different results than Spark. By default,
@@ -47,7 +78,7 @@ will fall back to Spark but can be enabled by setting `spark.comet.expression.al
47
78
48
79
## Array Expressions
49
80
50
-
Comet has experimental support for a number of array expressions. These are experimental and currently marked
81
+
Comet has experimental support for a number of array expressions. These are experimental and currently marked
51
82
as incompatible and can be enabled by setting `spark.comet.expression.allowIncompatible=true`.
52
83
53
84
## Regular Expressions
@@ -82,5 +113,5 @@ The following cast operations are not compatible with Spark for all inputs and a
82
113
83
114
### Unsupported Casts
84
115
85
-
Any cast not listed in the previous tables is currently unsupported. We are working on adding more. See the
116
+
Any cast not listed in the previous tables is currently unsupported. We are working on adding more. See the
86
117
[tracking issue](https://github.com/apache/datafusion-comet/issues/286) for more details.
0 commit comments