-
Notifications
You must be signed in to change notification settings - Fork 342
Open
Milestone
Description
Apache Iceberg version
0.9.1 (latest release)
Please describe the bug 🐞
After applying the fix from #1983 to fix decimal conversion, "conversion from NoneType to Decimal is not supported" is thrown if a decimal column is empty. Here's a snippet of code to replicate
from decimal import Decimal
import pyarrow as pa
from pyiceberg.io.pyarrow import pyarrow_to_schema
from pyiceberg.schema import Schema
from pyiceberg.types import DecimalType, NestedField
from pyiceberg.catalog import Catalog, load_catalog
from pyiceberg.table.name_mapping import MappedField, NameMapping
warehouse_path = '/tmp'
catalog = load_catalog(
"default",
type = "sql",
uri = f"sqlite://///{warehouse_path}/test",
warehouse = f'file://{warehouse_path}',
)
catalog.create_namespace_if_not_exists(
'test',
{'loacation': f'file://{warehouse_path}'}
)
decimal8 = pa.array([Decimal("123.45"), Decimal("678.91")], pa.decimal128(8, 2))
decimal16 = pa.array([Decimal("12345679.123456"), Decimal("67891234.678912")], pa.decimal128(16, 6))
decimal19 = pa.array([Decimal("1234567890123.123456"), Decimal("9876543210703.654321")], pa.decimal128(19, 6))
empty_decimal8 = pa.array([None, None], pa.decimal128(8,2))
empty_decimal16 = pa.array([None, None], pa.decimal128(16, 6))
empty_decimal19 = pa.array([None, None], pa.decimal128(19, 6))
table = pa.Table.from_pydict(
{
"decimal8": decimal8,
"decimal16": decimal16,
"decimal19": decimal19,
"empty_decimal8": empty_decimal8,
"empty_decimal16": empty_decimal16,
"empty_decimal19": empty_decimal19,
},
)
pa_schema = table.schema
name_mapping = NameMapping([
MappedField(**{'field-id': i+1, 'names': [name]})
for i, name
in enumerate(pa_schema.names)
])
schema = pyarrow_to_schema(
pa_schema,
name_mapping
)
pyiceberg_table = catalog.create_table(
'test.decimals',
schema=table.schema,
)
pyiceberg_table.append(table)
My current fix to data_file_statistics_from_parquet_metadata is as follows, but I'm unsure what the unintended consequences would be.
if isinstance(stats_col.iceberg_type, DecimalType) and statistics.physical_type != "FIXED_LEN_BYTE_ARRAY":
scale = stats_col.iceberg_type.scale
if statistics.min_raw:
col_aggs[field_id].update_min(unscaled_to_decimal(statistics.min_raw, scale))
if statistics.max_raw:
col_aggs[field_id].update_max(unscaled_to_decimal(statistics.max_raw, scale))
I could not get the nightly build to install, so I'm unsure if this still exists. I tested it with 0.9.0 and did not run into this issue.
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels