Skip to content

Commit deaec6f

Browse files
andygroveclaude
andauthored
docs: document datetime rebasing and V2 API limitations for DataFusion-based scans (apache#3259)
Add two new limitations to the shared limitations section for native_datafusion and native_iceberg_compat scan implementations: 1. No support for datetime rebasing detection or the spark.comet.exceptionOnDatetimeRebase configuration. When reading Parquet files with dates/timestamps written before Spark 3.0 (hybrid Julian/Gregorian calendar), these implementations cannot detect legacy values and may produce incorrect results for dates before October 15, 1582. 2. No support for Spark's Datasource V2 API. When V2 is enabled, Comet falls back to native_comet. Co-authored-by: Claude Opus 4.5 <[email protected]>
1 parent 1a000ab commit deaec6f

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

docs/source/contributor-guide/parquet_scans.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,15 @@ The `native_datafusion` and `native_iceberg_compat` scans share the following li
5050
`spark.comet.scan.unsignedSmallIntSafetyCheck=false`. Note that `ByteType` columns are always safe because they can
5151
only come from signed `INT8`, where truncation preserves the signed value.
5252
- No support for default values that are nested types (e.g., maps, arrays, structs). Literal default values are supported.
53+
- No support for datetime rebasing detection or the `spark.comet.exceptionOnDatetimeRebase` configuration. When reading
54+
Parquet files containing dates or timestamps written before Spark 3.0 (which used a hybrid Julian/Gregorian calendar),
55+
the `native_comet` implementation can detect these legacy values and either throw an exception or read them without
56+
rebasing. The DataFusion-based implementations do not have this detection capability and will read all dates/timestamps
57+
as if they were written using the Proleptic Gregorian calendar. This may produce incorrect results for dates before
58+
October 15, 1582.
59+
- No support for Spark's Datasource V2 API. When `spark.sql.sources.useV1SourceList` does not include `parquet`,
60+
Spark uses the V2 API for Parquet scans. The DataFusion-based implementations only support the V1 API, so Comet
61+
will fall back to `native_comet` when V2 is enabled.
5362

5463
The `native_datafusion` scan has some additional limitations:
5564

0 commit comments

Comments
 (0)