Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@
import org.apache.iceberg.Snapshot;
import org.apache.iceberg.SortOrder;
import org.apache.iceberg.Table;
import org.apache.iceberg.TableMetadata;
import org.apache.iceberg.TableProperties;
import org.apache.iceberg.Transaction;
import org.apache.iceberg.UpdatePartitionSpec;
Expand Down Expand Up @@ -188,6 +189,7 @@
import static com.facebook.presto.iceberg.IcebergTableType.CHANGELOG;
import static com.facebook.presto.iceberg.IcebergTableType.DATA;
import static com.facebook.presto.iceberg.IcebergTableType.EQUALITY_DELETES;
import static com.facebook.presto.iceberg.IcebergUtil.MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS;
import static com.facebook.presto.iceberg.IcebergUtil.MIN_FORMAT_VERSION_FOR_DELETE;
import static com.facebook.presto.iceberg.IcebergUtil.getColumns;
import static com.facebook.presto.iceberg.IcebergUtil.getColumnsForWrite;
Expand Down Expand Up @@ -357,6 +359,48 @@ public Optional<IcebergProcedureContext> getProcedureContext()
return this.procedureContext;
}

protected static void validateTableForPresto(BaseTable table, Optional<Long> tableSnapshotId)
{
Snapshot snapshot;
try {
snapshot = tableSnapshotId
.map(table::snapshot)
.orElse(table.currentSnapshot());
}
catch (RuntimeException e) {
// If the snapshot cannot be retrieved (e.g. metadata is missing), we cannot validate the table.
Comment on lines +370 to +371
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Catching RuntimeException broadly here may hide unexpected failures during snapshot retrieval.

Swallowing all RuntimeExceptions here will also hide programming/configuration errors that should surface or at least be observable. Consider catching a narrower, Iceberg-specific exception for the expected failure mode, or at minimum log the exception before returning so operational debugging remains possible.

// Returning here allows operations that do not strictly require the snapshot (like DROP TABLE) to proceed.
return;
}

if (snapshot == null) {
// empty table, nothing to validate
return;
}

TableMetadata metadata = table.operations().current();
if (metadata.formatVersion() < 3) {
return;
}

Schema schema = metadata.schemasById().get(snapshot.schemaId());
if (schema == null) {
schema = metadata.schema();
}

// Reject schema default values (initial-default / write-default)
for (Types.NestedField field : schema.columns()) {
Comment on lines +391 to +392
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Column default validation only considers top-level columns and may miss nested fields

schema.columns() only iterates top-level fields, so defaults on nested struct fields (if Iceberg permits initialDefault / writeDefault there) would bypass this check. Either traverse the full schema recursively to validate nested Types.NestedFields, or explicitly confirm/document that defaults are only allowed on top-level fields.

if (field.initialDefault() != null || field.writeDefault() != null) {
throw new PrestoException(NOT_SUPPORTED, "Iceberg v3 column default values are not supported");
}
}

// Reject Iceberg table encryption
if (!metadata.encryptionKeys().isEmpty() || snapshot.keyId() != null || metadata.properties().containsKey("encryption.key-id")) {
throw new PrestoException(NOT_SUPPORTED, "Iceberg table encryption is not supported");
}
}

/**
* This class implements the default implementation for getTableLayoutForConstraint which will be used in the case of a Java Worker
*/
Expand Down Expand Up @@ -832,10 +876,17 @@ public ConnectorMergeTableHandle beginMerge(ConnectorSession session, ConnectorT
Table icebergTable = getIcebergTable(session, icebergTableHandle.getSchemaTableName());
int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();

if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE ||
!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
.map(mode -> mode.equals(MERGE_ON_READ.modeName()))
.orElse(false)) {
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION,
Comment on lines +879 to +880
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The error message for low format versions mentions update mode before it is validated, which can be misleading.

The exception text currently states both a format version and update mode requirement, but this branch only validates the format version; update mode could still be valid. Consider separating these into distinct errors (one for unsupported format version, one for invalid update mode) so users can see exactly which precondition failed.

Suggested implementation:

        Table icebergTable = getIcebergTable(session, icebergTableHandle.getSchemaTableName());
        int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();

        if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    "Iceberg table updates require at least format version 2");
        }

        if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
                .map(mode -> mode.equals(MERGE_ON_READ.modeName()))
                .orElse(false)) {
            throw new PrestoException(
                    ICEBERG_INVALID_METADATA,
                    format("Iceberg table updates require table property %s to be set to %s",
                            TableProperties.UPDATE_MODE,
                            MERGE_ON_READ.modeName()));
        }

    /**

These edits assume:

  1. ICEBERG_INVALID_METADATA is already defined in this class or imported; if not, you should use the existing error code previously used for invalid update mode (or introduce one consistent with the rest of the Iceberg module).
  2. Optional, TableProperties, and MERGE_ON_READ are already imported/available in this file. If any are missing, add the appropriate imports or static imports matching the existing style in the file.

"Iceberg table updates require at least format version 2 and update mode must be merge-on-read");
}
if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
throw new PrestoException(NOT_SUPPORTED,
format("Iceberg table updates for format version %s are not supported yet", formatVersion));
}
if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
.map(mode -> mode.equals(MERGE_ON_READ.modeName()))
Comment on lines +879 to +888
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Error codes/messages for unsupported format versions are inconsistent between beginMerge and beginUpdate.

beginMerge uses ICEBERG_INVALID_FORMAT_VERSION for format versions < 2, while this path uses NOT_SUPPORTED, and the messages differ in whether they mention update mode. Please align the exception type and message with the other row-level operations (merge/update/delete) for consistent, predictable behavior and easier programmatic handling.

Suggested implementation:

        if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    format(
                            "Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
                            MIN_FORMAT_VERSION_FOR_DELETE,
                            MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
                            formatVersion));
        }
        if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    format(
                            "Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
                            MIN_FORMAT_VERSION_FOR_DELETE,
                            MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
                            formatVersion));
        }

To fully align messages and error codes across all row-level operations (merge/update/delete), you should:

  1. Locate the corresponding format-version checks in beginMerge, beginDelete, and any other row-level operation helpers in IcebergAbstractMetadata (or related classes).
  2. Update their PrestoException construction to use:
    • The same ICEBERG_INVALID_FORMAT_VERSION error code.
    • The same message template:
      "Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s".
    • The same arguments: MIN_FORMAT_VERSION_FOR_DELETE, MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS, formatVersion.
      This will ensure consistent, predictable behavior and make programmatic handling of unsupported format versions easier across all operations.

.orElse(false)) {
throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION,
"Iceberg table updates require at least format version 2 and update mode must be merge-on-read");
}
Expand Down Expand Up @@ -1174,6 +1225,8 @@ public ConnectorInsertTableHandle beginInsert(ConnectorSession session, Connecto
verify(table.getIcebergTableName().getTableType() == DATA, "only the data table can have data inserted");
Table icebergTable = getIcebergTable(session, table.getSchemaTableName());
validateTableMode(session, icebergTable);
BaseTable baseTable = (BaseTable) icebergTable;
validateTableForPresto(baseTable, Optional.ofNullable(baseTable.currentSnapshot()).map(Snapshot::snapshotId));

return beginIcebergTableInsert(session, table, icebergTable);
}
Expand Down Expand Up @@ -1261,6 +1314,12 @@ public IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableNa
})
.orElseGet(() -> resolveSnapshotIdByName(table, name));

// Validate unsupported v3 features (column defaults, encryption) before
// proceeding
if (table instanceof BaseTable) {
validateTableForPresto((BaseTable) table, tableSnapshotId);
}

// Get Iceberg tables schema, properties, and location with missing
// filesystem metadata will fail.
// See https://github.com/prestodb/presto/pull/21181
Expand Down Expand Up @@ -1364,6 +1423,10 @@ public ConnectorDeleteTableHandle beginDelete(ConnectorSession session, Connecto
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(NOT_SUPPORTED, format("This connector only supports delete where one or more partitions are deleted entirely for table versions older than %d", MIN_FORMAT_VERSION_FOR_DELETE));
}
if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
throw new PrestoException(NOT_SUPPORTED,
format("Iceberg table updates for format version %s are not supported yet", formatVersion));
}
if (getDeleteMode(icebergTable) == RowLevelOperationMode.COPY_ON_WRITE) {
throw new PrestoException(NOT_SUPPORTED, "This connector only supports delete where one or more partitions are deleted entirely. Configure write.delete.mode table property to allow row level deletions.");
}
Expand Down Expand Up @@ -1620,11 +1683,17 @@ public ConnectorTableHandle beginUpdate(ConnectorSession session, ConnectorTable
IcebergTableHandle handle = (IcebergTableHandle) tableHandle;
Table icebergTable = getIcebergTable(session, handle.getSchemaTableName());
int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE ||
!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
.map(mode -> mode.equals(MERGE_ON_READ.modeName()))
.orElse(false)) {
throw new RuntimeException("Iceberg table updates require at least format version 2 and update mode must be merge-on-read");
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(NOT_SUPPORTED, "Iceberg table updates require at least format version 2");
}
if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
throw new PrestoException(NOT_SUPPORTED,
format("Iceberg table updates for format version %s are not supported yet", formatVersion));
}
if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
.map(mode -> mode.equals(MERGE_ON_READ.modeName()))
.orElse(false)) {
throw new PrestoException(NOT_SUPPORTED, "Iceberg table updates require update mode to be merge-on-read");
}
validateTableMode(session, icebergTable);
transaction = icebergTable.newTransaction();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import com.facebook.presto.spi.ConnectorSession;
import com.facebook.presto.spi.ConnectorSplit;
import com.facebook.presto.spi.ConnectorSplitSource;
import com.facebook.presto.spi.PrestoException;
import com.facebook.presto.spi.SplitWeight;
import com.facebook.presto.spi.connector.ConnectorPartitionHandle;
import com.facebook.presto.spi.schedule.NodeSelectionStrategy;
Expand Down Expand Up @@ -46,6 +47,7 @@
import static com.facebook.presto.iceberg.IcebergUtil.getTargetSplitSize;
import static com.facebook.presto.iceberg.IcebergUtil.metadataColumnsMatchPredicates;
import static com.facebook.presto.iceberg.IcebergUtil.partitionDataFromStructLike;
import static com.facebook.presto.spi.StandardErrorCode.NOT_SUPPORTED;
import static com.google.common.collect.ImmutableList.toImmutableList;
import static com.google.common.collect.Iterators.limit;
import static java.util.Objects.requireNonNull;
Expand Down Expand Up @@ -124,6 +126,13 @@ private ConnectorSplit toIcebergSplit(FileScanTask task)
PartitionSpec spec = task.spec();
Optional<PartitionData> partitionData = partitionDataFromStructLike(spec, task.file().partition());

// Validate no PUFFIN deletion vectors (Iceberg v3 feature not yet supported)
for (org.apache.iceberg.DeleteFile deleteFile : task.deletes()) {
if (deleteFile.format() == org.apache.iceberg.FileFormat.PUFFIN) {
throw new PrestoException(NOT_SUPPORTED, "Iceberg deletion vectors (PUFFIN format) are not supported");
}
}

// TODO: We should leverage residual expression and convert that to TupleDomain.
// The predicate here is used by readers for predicate push down at reader level,
// so when we do not use residual expression, we are just wasting CPU cycles
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,8 @@ public final class IcebergUtil
{
private static final Logger log = Logger.get(IcebergUtil.class);
public static final int MIN_FORMAT_VERSION_FOR_DELETE = 2;
public static final int MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS = 2;
public static final int MAX_SUPPORTED_FORMAT_VERSION = 3;

public static final long DOUBLE_POSITIVE_ZERO = 0x0000000000000000L;
public static final long DOUBLE_POSITIVE_INFINITE = 0x7ff0000000000000L;
Expand Down Expand Up @@ -1174,7 +1176,11 @@ public static Map<String, String> populateTableProperties(IcebergAbstractMetadat
public static int parseFormatVersion(String formatVersion)
{
try {
return parseInt(formatVersion);
int version = parseInt(formatVersion);
if (version > MAX_SUPPORTED_FORMAT_VERSION) {
throw new PrestoException(NOT_SUPPORTED, format("Iceberg table format version %d is not supported", version));
}
return version;
}
catch (NumberFormatException | IndexOutOfBoundsException e) {
throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION, "Unable to parse user provided format version");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.ImmutableSet;
import io.airlift.slice.Slice;
import org.apache.iceberg.BaseTable;
import org.apache.iceberg.DataFile;
import org.apache.iceberg.DataFiles;
import org.apache.iceberg.DeleteFile;
Expand Down Expand Up @@ -69,6 +70,7 @@
import static com.facebook.presto.iceberg.ExpressionConverter.toIcebergExpression;
import static com.facebook.presto.iceberg.IcebergAbstractMetadata.getSupportedSortFields;
import static com.facebook.presto.iceberg.IcebergSessionProperties.getCompressionCodec;
import static com.facebook.presto.iceberg.IcebergUtil.MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS;
import static com.facebook.presto.iceberg.IcebergUtil.getColumns;
import static com.facebook.presto.iceberg.IcebergUtil.getFileFormat;
import static com.facebook.presto.iceberg.PartitionSpecConverter.toPrestoPartitionSpec;
Expand Down Expand Up @@ -124,6 +126,16 @@ private ConnectorDistributedProcedureHandle beginCallDistributedProcedure(Connec
Table icebergTable = procedureContext.getTable();
IcebergTableHandle tableHandle = layoutHandle.getTable();

// Validate format version for OPTIMIZE operation
int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
Comment on lines +129 to +131
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Reusing MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS for OPTIMIZE may conflate different capability boundaries

Since OPTIMIZE/rewrite-data-files is not actually a row-level operation, tying it to MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS couples two unrelated capability checks. If the restriction on OPTIMIZE is driven by other concerns (e.g., manifest/delete semantics), consider introducing a separate constant or helper so future changes to row-level support don’t inadvertently change OPTIMIZE behavior.

Suggested implementation:

            // Validate format version for OPTIMIZE operation
            int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
            if (formatVersion > MAX_FORMAT_VERSION_FOR_OPTIMIZE) {
                throw new PrestoException(NOT_SUPPORTED,
                        format("OPTIMIZE is not supported for Iceberg table format version > %d. Table %s format version is %s.",
                                MAX_FORMAT_VERSION_FOR_OPTIMIZE,
                                icebergTable.name(),
                                formatVersion));
            }
  1. Inside the RewriteDataFilesProcedure class, add a dedicated constant for OPTIMIZE, for example near the top of the class:

    // Separate from MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS to allow independent evolution
    private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = 2;

    Place this inside the class body (after the opening {), not at the top-level.

  2. If you want to avoid hardcoding 2, you can initialize it from IcebergUtil.MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS today:

    private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS;

    but in that case keep the MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS import and add a comment explaining that OPTIMIZE is conceptually separate and may diverge in the future.

throw new PrestoException(NOT_SUPPORTED,
format("OPTIMIZE is not supported for Iceberg table format version > %d. Table %s format version is %s.",
MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
icebergTable.name(),
formatVersion));
}

SortOrder sortOrder = icebergTable.sortOrder();
List<String> sortFieldStrings = ImmutableList.of();
if (sortOrderIndex.isPresent()) {
Expand Down
Loading
Loading