Skip to content

date_trunc incorrect results in non-UTC timezone #2649

@andygrove

Description

@andygrove

Describe the bug

PR #2634 fixed some bugs with trunc/date_trunc, but I found another bug with the new tests added as part of this PR.

The tests pass when the Spark sessions timezone is UTC, but fail for other timezones.

When reading from DataFrame:

org.apache.comet.CometNativeException: Fail to process Arrow array with reason: Invalid argument error: RowConverter column schema mismatch, expected Timestamp(Microsecond, Some("America/Denver")) got Timestamp(Microsecond, Some("UTC")).

When reading from Parquet:


== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(1) CometColumnarToRow
   +- CometSort [c0#8, date_trunc(quarter, c0)#95], [c0#8 ASC NULLS FIRST]
      +- AQEShuffleRead coalesced
         +- ShuffleQueryStage 0
            +- CometExchange rangepartitioning(c0#8 ASC NULLS FIRST, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=1129]
               +- CometProject [c0#8, date_trunc(quarter, c0)#95], [c0#8, date_trunc(quarter, c0#8, Some(America/Denver)) AS date_trunc(quarter, c0)#95]
                  +- CometScan [native_iceberg_compat] parquet [c0#8] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-ec4ccf01-3f14-44b0-8c83-fa87cad8d6df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c0:timestamp>
+- == Initial Plan ==
   CometSort [c0#8, date_trunc(quarter, c0)#95], [c0#8 ASC NULLS FIRST]
   +- CometExchange rangepartitioning(c0#8 ASC NULLS FIRST, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=1018]
      +- CometProject [c0#8, date_trunc(quarter, c0)#95], [c0#8, date_trunc(quarter, c0#8, Some(America/Denver)) AS date_trunc(quarter, c0)#95]
         +- CometScan [native_iceberg_compat] parquet [c0#8] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-ec4ccf01-3f14-44b0-8c83-fa87cad8d6df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c0:timestamp>

== Results ==
!== Correct Answer - 1000 ==                              == Spark Answer - 1000 ==
 struct<c0:timestamp,date_trunc(quarter, c0):timestamp>   struct<c0:timestamp,date_trunc(quarter, c0):timestamp>
![3332-12-03 10:00:59.158,3332-09-30 23:00:00.0]          [3332-12-03 10:00:59.158,3332-10-01 00:00:00.0]
![3332-12-03 10:04:41.722,3332-09-30 23:00:00.0]          [3332-12-03 10:04:41.722,3332-10-01 00:00:00.0]
![3332-12-03 10:26:05.153,3332-09-30 23:00:00.0]          [3332-12-03 10:26:05.153,3332-10-01 00:00:00.0]

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions