- 
                Notifications
    
You must be signed in to change notification settings  - Fork 387
 
Description
Apache Iceberg version
0.10.0 (latest release)
Please describe the bug 🐞
My python version is 3.12. When I use pyiceberg to read a dataset table created Dremio using below code, i got "Cannot promote timestamp to timestamptz" error but if i use the same dataset to create table in spark, the table can be loaded correctly.
namespace = "test"
table_name = "Files_Created_by_Dremio"
table = nessie_catalog.load_table(f"{namespace}.{table_name}")
print(table.name())
print(table.schema())
table.scan().to_pandas()
('test', 'Files_Created_by_Dremio')
table {
1: ID: optional long
2: Name: optional string
3: FileImportID: optional int
4: ReceivedDateTime: optional timestamptz
5: IsMoved: optional boolean
6: MoveAttempt: optional int
7: IsProcessed: optional boolean
8: ProcessingAttempt: optional int
9: TotalDataRows: optional int
10: TotalProcessedDataRows: optional int
11: TotalExcludedDataRows: optional int
12: TotalAggregatedRows: optional int
13: IsArchived: optional boolean
14: ArchivalAttempt: optional int
15: IsMovedToTransaction: optional boolean
16: MovedToTransactionAttempt: optional int
17: IsEnriched: optional boolean
18: IsEnrichmentAttempt: optional int
19: IsAEnriched: optional boolean
20: AEnrichmentAttempt: optional int
21: IsBEnriched: optional boolean
22: BEnrichmentAttempt: optional int
23: IsCEnriched: optional boolean
24: CEnrichmentAttempt: optional int
25: HasPendingEnrichment: optional boolean
26: HasPendingAEnrichment: optional boolean
27: HasPendingBEnrichment: optional boolean
28: HasPendingCEnrichment: optional boolean
29: FileBusinessDate: optional timestamptz
30: IsUploaded: optional boolean
31: IsValid: optional boolean
32: Status: optional string
33: StatusMessage: optional string
34: LastActionDateTime: optional timestamptz
35: ActedBy: optional int
36: CreatedOn: optional timestamptz
37: IsAttempedAll: optional boolean
38: IsAttempedAll2: optional boolean
39: IsClosed: optional boolean
40: SparkUniqueID: optional string
}
ResolveError Traceback (most recent call last)
Cell In[3], line 12
  8 print(table.location())
  9 print(table.schema())
---> 12 table.scan().to_pandas()
ResolveError: Cannot promote timestamp to timestamptz
1465 def to_pandas(self, **kwargs: Any) -> pd.DataFrame:
1466 """Read a Pandas DataFrame eagerly from this Iceberg table.
1467
1468 Returns:
1469 pd.DataFrame: Materialized Pandas Dataframe from the Iceberg table
1470 """
-> 1471 return self.to_arrow().to_pandas(**kwargs)
1427 """Read an Arrow table eagerly from this DataScan.
1428
1429 All rows will be loaded into memory at once.
(...) 1432 pa.Table: Materialized Arrow Table from the Iceberg table's DataScan
1433 """
1434 from pyiceberg.io.pyarrow import ArrowScan
1436 return ArrowScan(
1437 self.table_metadata, self.io, self.projection(), self.row_filter, self.case_sensitive, self.limit
-> 1438 ).to_table(self.plan_files())
The tables i created on Dremio and spark are actually from the same dataset.
in dremio:
CREATE TABLE nessie.test.Files_Created_by_Dremio AS
select * from nessie.test.Files
in spark
CREATE TABLE nessie.test.Files_Created_by_Spark AS
select * from nessie.test.Files
Willingness to contribute
- I can contribute a fix for this bug independently
 - I would be willing to contribute a fix for this bug with guidance from the Iceberg community
 - I cannot contribute a fix for this bug at this time