Skip to content

Commit c059615

Browse files
apurva-metameta-codesync[bot]
authored andcommitted
feat: [presto][iceberg] Add Variant type support for Iceberg V3 (prestodb#27374)
Summary: Pull Request resolved: prestodb#27374 == RELEASE NOTES == General Changes * Upgrade Apache Iceberg library from 1.10.0 to 1.10.1. Hive Connector Changes * Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring bitmaps, including a DV reader, writer, page sink, and compaction procedure. * Add Iceberg equality delete file reader with sequence number conflict resolution per the Iceberg V2+ spec: equality deletes skip when deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips. * Wire dataSequenceNumber through the Presto protocol layer (Java → C++) to enable server-side sequence number conflict resolution for all delete file types. * Add PUFFIN file format support for deletion vector discovery, enabling the coordinator to locate DV files during split creation. * Add Iceberg V3 deletion vector write path with DV page sink and rewrite_delete_files compaction procedure for DV maintenance. * Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3 tables. * Add Variant type support for Iceberg V3, enabling semi-structured data columns in Iceberg tables. * Eagerly collect delete files during split creation with improved logging for easier debugging of Iceberg delete file resolution. * Improve IcebergSplitReader error handling and fix test file handle leaks. * Add end-to-end integration tests for Iceberg V3 covering snapshot lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE, MERGE, time-travel) and all 99 TPC-DS queries. Add support for Iceberg V3 VARIANT type across the Presto-Iceberg connector type conversion pipeline. The VARIANT type represents semi-structured data (JSON) and is mapped to Presto's unbounded VARCHAR type. Changes: - TypeConverter: Map VARIANT to VarcharType (unbounded) in toPrestoType(), and to ORC STRING type in toOrcType() - IcebergUtil: Handle VARIANT partition values as string slices in domain creation - PartitionData: Deserialize VARIANT partition values as text (same as STRING) - PartitionTable: Convert VariantType partition values to string representation - TestIcebergV3: Add comprehensive e2e tests for VARIANT type The VARIANT type in Iceberg 1.10.0 implements Type directly (not PrimitiveType or NestedType), so ColumnIdentity handling works automatically via the existing !isNestedType() check. Differential Revision: D96755027
1 parent 02bfd5c commit c059615

File tree

10 files changed

+2565
-1
lines changed

10 files changed

+2565
-1
lines changed

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConnector.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
import com.facebook.airlift.bootstrap.LifeCycleManager;
1717
import com.facebook.presto.hive.HiveTransactionHandle;
1818
import com.facebook.presto.iceberg.function.IcebergBucketFunction;
19+
import com.facebook.presto.iceberg.function.VariantFunctions;
1920
import com.facebook.presto.iceberg.function.changelog.ApplyChangelogFunction;
2021
import com.facebook.presto.iceberg.transaction.IcebergTransactionManager;
2122
import com.facebook.presto.iceberg.transaction.IcebergTransactionMetadata;
@@ -256,6 +257,7 @@ public Set<Class<?>> getSystemFunctions()
256257
.add(ApplyChangelogFunction.class)
257258
.add(IcebergBucketFunction.class)
258259
.add(IcebergBucketFunction.Bucket.class)
260+
.add(VariantFunctions.class)
259261
.build();
260262
}
261263

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -782,6 +782,7 @@ public static Domain createDomainFromIcebergPartitionValue(
782782
case TIMESTAMP_NANO:
783783
return singleValue(prestoType, Math.floorDiv((Long) value, 1000L));
784784
case STRING:
785+
case VARIANT:
785786
return singleValue(prestoType, utf8Slice(value.toString()));
786787
case FLOAT:
787788
return singleValue(prestoType, (long) floatToRawIntBits((Float) value));

presto-iceberg/src/main/java/com/facebook/presto/iceberg/PartitionData.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,7 @@ public static Object getValue(JsonNode partitionValue, Type type)
176176
}
177177
return partitionValue.doubleValue();
178178
case STRING:
179+
case VARIANT:
179180
return partitionValue.asText();
180181
case FIXED:
181182
case BINARY:

presto-iceberg/src/main/java/com/facebook/presto/iceberg/PartitionTable.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ private Object convert(Object value, Type type)
287287
if (value == null) {
288288
return null;
289289
}
290-
if (type instanceof Types.StringType) {
290+
if (type instanceof Types.StringType || type.isVariantType()) {
291291
return value.toString();
292292
}
293293
if (type instanceof Types.BinaryType) {

presto-iceberg/src/main/java/com/facebook/presto/iceberg/TypeConverter.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,8 @@ public static Type toPrestoType(org.apache.iceberg.types.Type type, TypeManager
147147
return RowType.from(fields.stream()
148148
.map(field -> new RowType.Field(Optional.of(field.name()), toPrestoType(field.type(), typeManager)))
149149
.collect(toImmutableList()));
150+
case VARIANT:
151+
return VarcharType.createUnboundedVarcharType();
150152
default:
151153
throw new UnsupportedOperationException(format("Cannot convert from Iceberg type '%s' (%s) to Presto type", type, type.typeId()));
152154
}
@@ -411,6 +413,7 @@ private static List<OrcType> toOrcType(int nextFieldTypeIndex, org.apache.iceber
411413
case TIMESTAMP_NANO:
412414
return ImmutableList.of(new OrcType(OrcType.OrcTypeKind.TIMESTAMP, ImmutableList.of(), ImmutableList.of(), Optional.empty(), Optional.empty(), Optional.empty(), attributes));
413415
case STRING:
416+
case VARIANT:
414417
return ImmutableList.of(new OrcType(OrcType.OrcTypeKind.STRING, ImmutableList.of(), ImmutableList.of(), Optional.empty(), Optional.empty(), Optional.empty(), attributes));
415418
case UUID:
416419
case FIXED:

0 commit comments

Comments
 (0)