Skip to content

"Cannot promote timestamp to timestamptz" error when loading Dremio created table #2663

@chrisqiqiu

Description

@chrisqiqiu

Apache Iceberg version

0.10.0 (latest release)

Please describe the bug 🐞

My python version is 3.12. When I use pyiceberg to read a dataset table created Dremio using below code, i got "Cannot promote timestamp to timestamptz" error but if i use the same dataset to create table in spark, the table can be loaded correctly.

namespace = "test"
table_name = "Files_Created_by_Dremio"
table = nessie_catalog.load_table(f"{namespace}.{table_name}")

print(table.name())
print(table.schema())
table.scan().to_pandas()

('test', 'Files_Created_by_Dremio')

table {

1: ID: optional long

2: Name: optional string

3: FileImportID: optional int

4: ReceivedDateTime: optional timestamptz

5: IsMoved: optional boolean

6: MoveAttempt: optional int

7: IsProcessed: optional boolean

8: ProcessingAttempt: optional int

9: TotalDataRows: optional int

10: TotalProcessedDataRows: optional int

11: TotalExcludedDataRows: optional int

12: TotalAggregatedRows: optional int

13: IsArchived: optional boolean

14: ArchivalAttempt: optional int

15: IsMovedToTransaction: optional boolean

16: MovedToTransactionAttempt: optional int

17: IsEnriched: optional boolean

18: IsEnrichmentAttempt: optional int

19: IsAEnriched: optional boolean

20: AEnrichmentAttempt: optional int

21: IsBEnriched: optional boolean

22: BEnrichmentAttempt: optional int

23: IsCEnriched: optional boolean

24: CEnrichmentAttempt: optional int

25: HasPendingEnrichment: optional boolean

26: HasPendingAEnrichment: optional boolean

27: HasPendingBEnrichment: optional boolean

28: HasPendingCEnrichment: optional boolean

29: FileBusinessDate: optional timestamptz

30: IsUploaded: optional boolean

31: IsValid: optional boolean

32: Status: optional string

33: StatusMessage: optional string

34: LastActionDateTime: optional timestamptz

35: ActedBy: optional int

36: CreatedOn: optional timestamptz

37: IsAttempedAll: optional boolean

38: IsAttempedAll2: optional boolean

39: IsClosed: optional boolean

40: SparkUniqueID: optional string

}


ResolveError Traceback (most recent call last)

Cell In[3], line 12

  8 print(table.location())

  9 print(table.schema())

---> 12 table.scan().to_pandas()

ResolveError: Cannot promote timestamp to timestamptz

1465 def to_pandas(self, **kwargs: Any) -> pd.DataFrame:

1466 """Read a Pandas DataFrame eagerly from this Iceberg table.

1467

1468 Returns:

1469 pd.DataFrame: Materialized Pandas Dataframe from the Iceberg table

1470 """

-> 1471 return self.to_arrow().to_pandas(**kwargs)

1427 """Read an Arrow table eagerly from this DataScan.

1428

1429 All rows will be loaded into memory at once.

(...) 1432 pa.Table: Materialized Arrow Table from the Iceberg table's DataScan

1433 """

1434 from pyiceberg.io.pyarrow import ArrowScan

1436 return ArrowScan(

1437 self.table_metadata, self.io, self.projection(), self.row_filter, self.case_sensitive, self.limit

-> 1438 ).to_table(self.plan_files())

The tables i created on Dremio and spark are actually from the same dataset.

in dremio:

CREATE TABLE nessie.test.Files_Created_by_Dremio AS
select * from nessie.test.Files

in spark

CREATE TABLE nessie.test.Files_Created_by_Spark AS
select * from nessie.test.Files

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions