Skip to content

Commit cded0ad

Browse files
committed
CometNativeIcebergScan with iceberg-rust using FileScanTasks.
1 parent 685dda9 commit cded0ad

File tree

17 files changed

+4002
-88
lines changed

17 files changed

+4002
-88
lines changed

common/src/main/scala/org/apache/comet/CometConf.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,16 @@ object CometConf extends ShimCometConf {
108108
.getOrElse("COMET_PARQUET_SCAN_IMPL", SCAN_AUTO)
109109
.toLowerCase(Locale.ROOT))
110110

111+
val COMET_ICEBERG_NATIVE_ENABLED: ConfigEntry[Boolean] =
112+
conf("spark.comet.scan.icebergNative.enabled")
113+
.doc(
114+
"Whether to enable native Iceberg scan using iceberg-rust. When enabled, Comet will " +
115+
"replace Spark's Iceberg BatchScanExec with CometIcebergNativeScanExec. Iceberg " +
116+
"planning is performed by Spark, and the resulting FileScanTasks are serialized " +
117+
"and passed to the native execution layer for reading data files.")
118+
.booleanConf
119+
.createWithDefault(false)
120+
111121
val COMET_RESPECT_PARQUET_FILTER_PUSHDOWN: ConfigEntry[Boolean] =
112122
conf("spark.comet.parquet.respectFilterPushdown")
113123
.doc(

docs/source/user-guide/latest/configs.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ Comet provides the following configuration settings.
8484
| spark.comet.regexp.allowIncompatible | Comet is not currently fully compatible with Spark for all regular expressions. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
8585
| spark.comet.scan.allowIncompatible | Some Comet scan implementations are not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
8686
| spark.comet.scan.enabled | Whether to enable native scans. When this is turned on, Spark will use Comet to read supported data sources (currently only Parquet is supported natively). Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. | true |
87+
| spark.comet.scan.icebergNative.enabled | Whether to enable native Iceberg scan using iceberg-rust. When enabled, Comet will replace Spark's Iceberg BatchScanExec with CometIcebergNativeScanExec. Iceberg planning is performed by Spark, and the resulting FileScanTasks are serialized and passed to the native execution layer for reading data files. | false |
8788
| spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. | false |
8889
| spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
8990
| spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |

0 commit comments

Comments
 (0)