-
Notifications
You must be signed in to change notification settings - Fork 249
Open
Description
Describe the bug
PR #2634 fixed some bugs with trunc/date_trunc, but I found another bug with the new tests added as part of this PR.
The tests pass when the Spark sessions timezone is UTC, but fail for other timezones.
When reading from DataFrame:
org.apache.comet.CometNativeException: Fail to process Arrow array with reason: Invalid argument error: RowConverter column schema mismatch, expected Timestamp(Microsecond, Some("America/Denver")) got Timestamp(Microsecond, Some("UTC")).
When reading from Parquet:
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(1) CometColumnarToRow
+- CometSort [c0#8, date_trunc(quarter, c0)#95], [c0#8 ASC NULLS FIRST]
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- CometExchange rangepartitioning(c0#8 ASC NULLS FIRST, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=1129]
+- CometProject [c0#8, date_trunc(quarter, c0)#95], [c0#8, date_trunc(quarter, c0#8, Some(America/Denver)) AS date_trunc(quarter, c0)#95]
+- CometScan [native_iceberg_compat] parquet [c0#8] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-ec4ccf01-3f14-44b0-8c83-fa87cad8d6df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c0:timestamp>
+- == Initial Plan ==
CometSort [c0#8, date_trunc(quarter, c0)#95], [c0#8 ASC NULLS FIRST]
+- CometExchange rangepartitioning(c0#8 ASC NULLS FIRST, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=1018]
+- CometProject [c0#8, date_trunc(quarter, c0)#95], [c0#8, date_trunc(quarter, c0#8, Some(America/Denver)) AS date_trunc(quarter, c0)#95]
+- CometScan [native_iceberg_compat] parquet [c0#8] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-ec4ccf01-3f14-44b0-8c83-fa87cad8d6df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c0:timestamp>
== Results ==
!== Correct Answer - 1000 == == Spark Answer - 1000 ==
struct<c0:timestamp,date_trunc(quarter, c0):timestamp> struct<c0:timestamp,date_trunc(quarter, c0):timestamp>
![3332-12-03 10:00:59.158,3332-09-30 23:00:00.0] [3332-12-03 10:00:59.158,3332-10-01 00:00:00.0]
![3332-12-03 10:04:41.722,3332-09-30 23:00:00.0] [3332-12-03 10:04:41.722,3332-10-01 00:00:00.0]
![3332-12-03 10:26:05.153,3332-09-30 23:00:00.0] [3332-12-03 10:26:05.153,3332-10-01 00:00:00.0]
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response