docs: Update confs to bypass Iceberg Spark issues (#2166)

hsiang-c · web-flow · commit 7b85d03b4e13 · 2025-08-19T10:54:45.000-07:00
* Update confs to bypass Iceberg Spark issues
 - Document current limitation

* Update iceberg.md

* Users can diable Spark's AQE as well

* Let users turn off AQE or Comet's broadcastExchange
diff --git a/docs/source/user-guide/iceberg.md b/docs/source/user-guide/iceberg.md
@@ -80,11 +80,13 @@ $SPARK_HOME/bin/spark-shell \
     --conf spark.sql.catalog.spark_catalog.type=hadoop \
     --conf spark.sql.catalog.spark_catalog.warehouse=/tmp/warehouse \
     --conf spark.plugins=org.apache.spark.CometPlugin \
-    --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
+    --conf spark.comet.exec.shuffle.enabled=false \
     --conf spark.sql.iceberg.parquet.reader-type=COMET \
     --conf spark.comet.explainFallback.enabled=true \
     --conf spark.memory.offHeap.enabled=true \
-    --conf spark.memory.offHeap.size=2g
+    --conf spark.memory.offHeap.size=2g \
+    --conf spark.comet.use.lazyMaterialization=false \
+    --conf spark.comet.schemaEvolution.enabled=true
 ```
 
 Create an Iceberg table. Note that Comet will not accelerate this part.
@@ -138,4 +140,11 @@ scala> spark.sql(s"SELECT * from t1").explain()
 == Physical Plan ==
 *(1) CometColumnarToRow
 +- CometBatchScan spark_catalog.default.t1[c0#26, c1#27] spark_catalog.default.t1 (branch=null) [filters=, groupedBy=] RuntimeFilters: []
-```
+```
+
+## Known issues
+ - We temporarily disable Comet when there are delete files in Iceberg scan, see Iceberg [1.8.1 diff](../../../dev/diffs/iceberg/1.8.1.diff) and this [PR](https://github.com/apache/iceberg/pull/13793)
+   - Iceberg scan w/ delete files lead to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2117) and [incorrect results](https://github.com/apache/datafusion-comet/issues/2118)
+ - Enabling `CometShuffleManager` leads to [runtime exceptions](https://github.com/apache/datafusion-comet/issues/2086)
+ - Spark Runtime Filtering isn't [working](https://github.com/apache/datafusion-comet/issues/2116)
+   - You can bypass the issue by either setting `spark.sql.adaptive.enabled=false` or `spark.comet.exec.broadcastExchange.enabled=false`