Skip to content

Comments

feat: Add initial support for Iceberg format version 3#27021

Open
Joe-Abraham wants to merge 1 commit intoprestodb:masterfrom
Joe-Abraham:icebergv3
Open

feat: Add initial support for Iceberg format version 3#27021
Joe-Abraham wants to merge 1 commit intoprestodb:masterfrom
Joe-Abraham:icebergv3

Conversation

@Joe-Abraham
Copy link
Contributor

@Joe-Abraham Joe-Abraham commented Jan 23, 2026

Description

Add initial support for Iceberg table format version 3 while constraining unsupported features and row-level operations to safe, explicitly validated paths.

New Features:

  1. Allow creating Iceberg tables with format version 3 and inserting/querying data from them, including partitioned tables.
  2. Support upgrading existing Iceberg format version 2 tables to format version 3.

Enhancements:

  1. Introduce version guardrails for Iceberg operations, including explicit maximum supported table format and maximum format version for row-level operations.
  2. Validate Iceberg v3 tables for unsupported features such as column default values and table encryption before executing writes or inserts.
  3. Add validation to reject use of PUFFIN-based deletion vectors that are not yet supported.
  4. Improve error handling for Iceberg update and delete operations by using specific PrestoException errors and clearer messages when format/version constraints are violated.
  5. Prevent OPTIMIZE (rewrite_data_files) from running on Iceberg tables with format versions above the supported threshold.

Test Plan

Add TestIcebergV3 integration test suite covering creation, upgrade, insert, query, and partitioning for v3 tables, as well as rejection of unsupported delete, update, merge, and OPTIMIZE operations on v3 tables.

Reference PR - trinodb/trino#27786
Co-authored-by: Dain Sundstrom dain@iq80.com

Release Notes

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for creating Iceberg tables with format-version = '3'.
* Add reading from Iceberg V3 tables, including partitioned tables.
* Add INSERT operations into Iceberg V3 tables.
* Add support for upgrading existing V2 tables to V3 using the Iceberg API.

Summary by Sourcery

Add guarded initial support for Iceberg table format version 3 while constraining unsupported features and row-level operations to fail fast with clear errors.

New Features:

  • Enable creating, reading from, and inserting into Iceberg tables with format version 3, including partitioned tables.

Enhancements:

  • Introduce format-version guardrails for Iceberg operations, including a maximum supported table format and a maximum format version for row-level operations.
  • Validate Iceberg v3 tables for unsupported features such as column default values, table encryption, and PUFFIN-based deletion vectors before executing reads or writes.
  • Tighten validation and error handling for Iceberg update, delete, merge, and OPTIMIZE (rewrite_data_files) operations on tables with unsupported format versions.

Tests:

  • Add TestIcebergV3 integration suite covering supported v3 operations and expected failures for unsupported delete, update, merge, OPTIMIZE, encryption, and deletion vector features.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 23, 2026
@Joe-Abraham Joe-Abraham changed the title feat(iceberg): Add Iceberg format version 3 table support feat: Add Iceberg format version 3 table support Jan 23, 2026
@Joe-Abraham Joe-Abraham force-pushed the icebergv3 branch 2 times, most recently from 579fa72 to 0c2ed1b Compare January 30, 2026 09:30
@steveburnett
Copy link
Contributor

* Add support for creating Iceberg format version 3 tables, upgrading v2 tables to v3, and inserting into v3 tables. 
* Add handling of unsupported v3 features by explicitly rejecting them.

Could we consider adding documentation for these?

The second point "Add handling of unsupported v3 features by explicitly rejecting them" would just need that statement.

The first - creating Iceberg format version 3 tables, upgrading v2 tables to v3, inserting into v3 tables - do any of these need examples added to the Iceberg doc? Doesn't have to be a big thing, something like "Insert into an Iceberg v3 table following this example:" and a code block.

@Joe-Abraham Joe-Abraham force-pushed the icebergv3 branch 2 times, most recently from 3b86f68 to 30d1c7a Compare February 4, 2026 05:37
@Joe-Abraham Joe-Abraham force-pushed the icebergv3 branch 3 times, most recently from 7b0c149 to 7dac029 Compare February 17, 2026 10:27
@prestodb prestodb deleted a comment from sourcery-ai bot Feb 18, 2026
@Joe-Abraham Joe-Abraham marked this pull request as ready for review February 18, 2026 09:05
@prestodb-ci prestodb-ci requested review from a team, agrawalreetika and bibith4 and removed request for a team February 18, 2026 09:05
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 18, 2026

Reviewer's Guide

Adds guarded support for Iceberg table format version 3: allows create/read/insert/partitioning and upgrades from v2 while enforcing strict format/version guardrails, rejecting unsupported v3 features (encryption, column defaults, PUFFIN deletion vectors) and disallowing v3 row-level operations and OPTIMIZE, backed by a dedicated TestIcebergV3 suite.

Sequence diagram for Iceberg v3 table insert validation

sequenceDiagram
    actor User
    participant PrestoCoordinator
    participant IcebergAbstractMetadata
    participant IcebergCatalog
    participant BaseTable

    User->>PrestoCoordinator: SQL INSERT INTO iceberg_v3_table
    PrestoCoordinator->>IcebergAbstractMetadata: beginInsert(session, tableHandle)
    IcebergAbstractMetadata->>IcebergCatalog: getIcebergTable(session, schemaTableName)
    IcebergCatalog-->>IcebergAbstractMetadata: Table icebergTable
    IcebergAbstractMetadata->>IcebergAbstractMetadata: validateTableMode(session, icebergTable)
    IcebergAbstractMetadata->>BaseTable: cast icebergTable to BaseTable
    IcebergAbstractMetadata->>BaseTable: currentSnapshot()
    BaseTable-->>IcebergAbstractMetadata: Snapshot snapshot
    IcebergAbstractMetadata->>IcebergAbstractMetadata: validateTableForPresto(baseTable, Optional.of(snapshotId))

    alt metadata.formatVersion < 3 or empty table
        IcebergAbstractMetadata-->>IcebergAbstractMetadata: Return without extra checks
        IcebergAbstractMetadata-->>PrestoCoordinator: beginIcebergTableInsert(...)
        PrestoCoordinator-->>User: INSERT succeeds
    else metadata.formatVersion >= 3
        IcebergAbstractMetadata->>BaseTable: operations().current()
        IcebergAbstractMetadata->>BaseTable: schemasById().get(snapshot.schemaId()) or schema()
        IcebergAbstractMetadata->>IcebergAbstractMetadata: check columns for default values
        alt any column has initialDefault or writeDefault
            IcebergAbstractMetadata-->>PrestoCoordinator: PrestoException NOT_SUPPORTED (column defaults)
            PrestoCoordinator-->>User: INSERT fails (v3 column defaults not supported)
        else no column defaults
            IcebergAbstractMetadata->>IcebergAbstractMetadata: check encryptionKeys, snapshot.keyId, properties[encryption.key-id]
            alt any encryption present
                IcebergAbstractMetadata-->>PrestoCoordinator: PrestoException NOT_SUPPORTED (encryption)
                PrestoCoordinator-->>User: INSERT fails (encryption not supported)
            else no encryption
                IcebergAbstractMetadata-->>PrestoCoordinator: beginIcebergTableInsert(...)
                PrestoCoordinator-->>User: INSERT succeeds
            end
        end
    end
Loading

Class diagram for Iceberg v3 guardrails and validation

classDiagram
    class IcebergUtil {
        <<utility>>
        +int MIN_FORMAT_VERSION_FOR_DELETE
        +int MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS
        +int MAX_SUPPORTED_FORMAT_VERSION
        +int parseFormatVersion(String formatVersion)
    }

    class IcebergAbstractMetadata {
        +Optional getProcedureContext()
        +ConnectorInsertTableHandle beginInsert(ConnectorSession session, ConnectorTableHandle tableHandle)
        +IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableName tableName)
        +ConnectorMergeTableHandle beginMerge(ConnectorSession session, ConnectorTableHandle tableHandle)
        +ConnectorDeleteTableHandle beginDelete(ConnectorSession session, ConnectorTableHandle tableHandle)
        +ConnectorTableHandle beginUpdate(ConnectorSession session, ConnectorTableHandle tableHandle)
        +void validateTableMode(ConnectorSession session, Table icebergTable)
        +static void validateTableForPresto(BaseTable table, Optional tableSnapshotId)
    }

    class RewriteDataFilesProcedure {
        +ConnectorDistributedProcedureHandle beginCallDistributedProcedure(ConnectorSession session, ConnectorTransactionHandle transaction, ConnectorTableHandle layoutHandle)
    }

    class IcebergSplitSource {
        -ConnectorSplit toIcebergSplit(FileScanTask task)
    }

    class BaseTable {
        +TableOperations operations()
        +Snapshot currentSnapshot()
        +Snapshot snapshot(long snapshotId)
        +String name()
    }

    class TableMetadata {
        +int formatVersion()
        +Schema schema()
        +Map schemasById()
        +Map encryptionKeys()
        +Map properties()
    }

    class Snapshot {
        +long snapshotId()
        +int schemaId()
        +String keyId()
    }

    class FileScanTask {
        +PartitionSpec spec()
        +DataFile file()
        +List deletes()
    }

    class DeleteFile {
        +FileFormat format()
    }

    class PrestoException {
        +PrestoException(ErrorCode errorCode, String message)
    }

    class ErrorCodes {
        <<enumeration>>
        NOT_SUPPORTED
        ICEBERG_INVALID_FORMAT_VERSION
    }

    IcebergAbstractMetadata --> IcebergUtil : uses constants
    IcebergAbstractMetadata --> BaseTable : casts Table to
    IcebergAbstractMetadata --> TableMetadata : reads metadata
    IcebergAbstractMetadata --> Snapshot : validates snapshot
    RewriteDataFilesProcedure --> IcebergUtil : uses MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS
    RewriteDataFilesProcedure --> BaseTable : formatVersion from operations
    IcebergSplitSource --> FileScanTask : reads task
    IcebergSplitSource --> DeleteFile : iterates deletes
    DeleteFile --> PrestoException : throws when format is PUFFIN

    IcebergUtil --> PrestoException : throws when version unsupported
    IcebergAbstractMetadata --> PrestoException : throws for column defaults
    IcebergAbstractMetadata --> PrestoException : throws for encryption
    IcebergAbstractMetadata --> PrestoException : throws for invalid format version
    IcebergAbstractMetadata --> PrestoException : throws for unsupported row-level ops
    RewriteDataFilesProcedure --> PrestoException : throws when OPTIMIZE on unsupported version
    IcebergSplitSource --> PrestoException : throws when PUFFIN deletion vectors used
    PrestoException --> ErrorCodes : uses error codes
Loading

File-Level Changes

Change Details Files
Introduce centralized validation and guardrails for Iceberg format versions, v3 feature checks, and row-level operation constraints.
  • Add validateTableForPresto helper that inspects current snapshot/schema for v3 tables and rejects column default values and any table encryption configuration via NOT_SUPPORTED errors
  • Invoke validation helper when resolving table handles and starting INSERT operations to block unsupported v3 features on read/write paths
  • Define MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS and MAX_SUPPORTED_FORMAT_VERSION constants and use them to cap allowed table format versions globally and specifically for row-level operations
  • Tighten beginMerge, beginDelete, and beginUpdate to enforce minimum/maximum format versions and merge-on-read requirements using specific PrestoException types and clearer messages
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
Prevent unsupported operations (row-level DML and OPTIMIZE) on Iceberg tables above the supported format version threshold, and reject PUFFIN deletion vectors during scan planning.
  • Guard OPTIMIZE (rewrite_data_files procedure) by checking the table format version and throwing NOT_SUPPORTED when it exceeds MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS
  • During split generation, scan delete files for PUFFIN format and fail with a specific NOT_SUPPORTED error when encountered
  • Align error messages for v3 (and above) row-level operations with the new guardrail constants to ensure deletes/updates/merges are rejected consistently
presto-iceberg/src/main/java/com/facebook/presto/iceberg/procedure/RewriteDataFilesProcedure.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplitSource.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
Add comprehensive integration tests for Iceberg v3 tables covering supported operations and rejection of unsupported features and operations.
  • Add TestIcebergV3 integration test class using the Hadoop Iceberg catalog to create, insert into, and query v3 tables (including partitioned tables)
  • Test creation failure for unsupported format version 4 via parseFormatVersion guardrail
  • Test upgrading a v2 table to v3 via Iceberg API and subsequent behavior
  • Assert that DELETE, UPDATE, MERGE, OPTIMIZE, and PUFFIN deletion vectors on v3 tables fail with the expected NOT_SUPPORTED messages
  • Verify that encryption-enabled v3 tables are rejected on SELECT and INSERT via validateTableForPresto, using direct Iceberg API to set encryption and to drop such tables
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The new constant MAX_SUPPORTED_FORMAT_VERSION is introduced but not used anywhere in this change; either wire it into the relevant validation paths or remove it to avoid confusion.
  • The validation and error handling for format versions and update/delete modes (e.g., in beginMerge, beginDelete, beginUpdate, and OPTIMIZE) are very similar but use slightly different exception types/messages; consider centralizing this logic in a shared helper to keep behavior consistent and easier to maintain.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new constant MAX_SUPPORTED_FORMAT_VERSION is introduced but not used anywhere in this change; either wire it into the relevant validation paths or remove it to avoid confusion.
- The validation and error handling for format versions and update/delete modes (e.g., in beginMerge, beginDelete, beginUpdate, and OPTIMIZE) are very similar but use slightly different exception types/messages; consider centralizing this logic in a shared helper to keep behavior consistent and easier to maintain.

## Individual Comments

### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:874-875` </location>
<code_context>
-                !Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
-                        .map(mode -> mode.equals(MERGE_ON_READ.modeName()))
-                        .orElse(false)) {
+        if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
+            throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION,
+                    "Iceberg table updates require at least format version 2 and update mode must be merge-on-read");
+        }
</code_context>

<issue_to_address>
**suggestion:** The error message for low format versions mentions update mode before it is validated, which can be misleading.

The exception text currently states both a format version and update mode requirement, but this branch only validates the format version; update mode could still be valid. Consider separating these into distinct errors (one for unsupported format version, one for invalid update mode) so users can see exactly which precondition failed.

Suggested implementation:

```java
        Table icebergTable = getIcebergTable(session, icebergTableHandle.getSchemaTableName());
        int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();

        if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    "Iceberg table updates require at least format version 2");
        }

        if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
                .map(mode -> mode.equals(MERGE_ON_READ.modeName()))
                .orElse(false)) {
            throw new PrestoException(
                    ICEBERG_INVALID_METADATA,
                    format("Iceberg table updates require table property %s to be set to %s",
                            TableProperties.UPDATE_MODE,
                            MERGE_ON_READ.modeName()));
        }

    /**

```

These edits assume:
1. `ICEBERG_INVALID_METADATA` is already defined in this class or imported; if not, you should use the existing error code previously used for invalid update mode (or introduce one consistent with the rest of the Iceberg module).
2. `Optional`, `TableProperties`, and `MERGE_ON_READ` are already imported/available in this file. If any are missing, add the appropriate imports or static imports matching the existing style in the file.
</issue_to_address>

### Comment 2
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:43` </location>
<code_context>
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+import static org.testng.Assert.assertEquals;
+
+public class TestIcebergV3
+        extends AbstractTestQueryFramework
+{
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for rejecting unsupported v3 schema default values and table encryption introduced in `validateTableForPresto`.

`validateTableForPresto` now rejects v3 column defaults (`initialDefault`/`writeDefault`) and table encryption, but the current tests don’t cover these failure paths. Please add tests that:

1. Use a v3 table with a column default and assert that INSERT/SELECT fail with the expected `NOT_SUPPORTED` error.
2. Use a v3 table with encryption enabled (e.g., via `TableMetadata.encryptionKeys`/`keyId`/related properties) and assert that operations fail with the expected error.

This will exercise the new validation logic end-to-end.

Suggested implementation:

```java
import java.util.OptionalInt;

import org.testng.annotations.Test;

```

```java
    private static final String TEST_SCHEMA = "tpch";
    private static final String V3_COLUMN_DEFAULT_TABLE = "v3_column_default";
    private static final String V3_ENCRYPTED_TABLE = "v3_encrypted";

```

```java
    @Test
    public void testV3ColumnDefaultsAreRejected()
    {
        assertQueryFails(
                format("SELECT * FROM %s.%s.%s", ICEBERG_CATALOG, TEST_SCHEMA, V3_COLUMN_DEFAULT_TABLE),
                ".*Iceberg v3 column default values are not supported.*");

        assertQueryFails(
                format("INSERT INTO %s.%s.%s VALUES (1)", ICEBERG_CATALOG, TEST_SCHEMA, V3_COLUMN_DEFAULT_TABLE),
                ".*Iceberg v3 column default values are not supported.*");
    }

    @Test
    public void testV3TableEncryptionIsRejected()
    {
        assertQueryFails(
                format("SELECT * FROM %s.%s.%s", ICEBERG_CATALOG, TEST_SCHEMA, V3_ENCRYPTED_TABLE),
                ".*Iceberg table encryption is not supported.*");

        assertQueryFails(
                format("INSERT INTO %s.%s.%s VALUES (1)", ICEBERG_CATALOG, TEST_SCHEMA, V3_ENCRYPTED_TABLE),
                ".*Iceberg table encryption is not supported.*");
    }
}

```

To fully exercise `validateTableForPresto` end-to-end, you’ll also need to:

1. **Create the v3 table with column defaults** in the same warehouse used by the Hadoop catalog:
   - Use the Iceberg Java APIs (e.g., `HadoopCatalog` or `TableOperations` directly) pointing at `getIcebergDataDirectoryPath(queryRunner, TEST_SCHEMA)` with `TableProperties.FORMAT_VERSION` set to `3`.
   - Define at least one column with `initialDefault`/`writeDefault` in the table’s metadata (either via the public API if available in your Iceberg version or by manipulating the metadata JSON to include the appropriate default value fields).
   - Ensure the table is created in the `TEST_SCHEMA` namespace with name `V3_COLUMN_DEFAULT_TABLE` so that Presto sees it as `iceberg.tpch.v3_column_default`.

2. **Create the v3 encrypted table** similarly:
   - Use a v3 Iceberg table and configure encryption via `TableMetadata.encryptionKeys` / key ID or the relevant table properties used in your Iceberg version so that the metadata contains an encryption spec.
   - Create it with table name `V3_ENCRYPTED_TABLE` in schema `TEST_SCHEMA` so it’s visible as `iceberg.tpch.v3_encrypted`.

3. If `assertQueryFails` has overloads that accept an error code, you may want to tighten the assertions by checking the `NOT_SUPPORTED` error code in addition to the message regex.

These setup steps will ensure that when the tests run the Presto connector loads the tables, triggers `validateTableForPresto`, and hits the new NOT_SUPPORTED paths for v3 column defaults and encryption.
</issue_to_address>

### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:67` </location>
<code_context>
+    }
+
+    @Test
+    public void testCreateV3Table()
+    {
+        String tableName = "test_create_v3_table";
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a negative test for creating a table with an unsupported Iceberg format version (e.g., 4).

The new `MAX_SUPPORTED_FORMAT_VERSION = 3` constant means higher versions should be rejected, and `testCreateV3Table` only covers the success path. Please add a test that attempts to create a table with `"format-version" = '4'` (or >3) and asserts that it fails with the expected error message, so we lock in this behavior and ensure users see a clear error when using unsupported versions.

Suggested implementation:

```java
    @Test
    public void testCreateV3Table()
    {
        String tableName = "test_create_v3_table";
        try {
            assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
            Table table = loadTable(tableName);
            assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
            assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
        }
        finally {
            dropTable(tableName);
        }
    }

    @Test
    public void testCreateUnsupportedFormatVersion()
    {
        String tableName = "test_create_v4_table";
        // Ensure clean state in case a previous run created the table
        dropTable(tableName);

        assertQueryFails(
                "CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '4')",
                ".*format version.*4.*not supported.*");
    }

```

The exact error message pattern in the `assertQueryFails` assertion may need to be adjusted to match the actual exception text thrown when `MAX_SUPPORTED_FORMAT_VERSION = 3` is exceeded. After running the test suite once, update the regex string to align precisely with the real error message if it differs.
</issue_to_address>

### Comment 4
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:107` </location>
<code_context>
+    }
+
+    @Test
+    public void testInsertIntoV3Table()
+    {
+        String tableName = "test_insert_v3_table";
</code_context>

<issue_to_address>
**suggestion (testing):** Add a test to cover rejection of PUFFIN deletion vectors in scan planning.

The new logic throws a `PrestoException` when a `FileScanTask` has PUFFIN deletion vectors (`deleteFile.format() == PUFFIN`), but this scenario isn’t exercised by the current tests. If the Iceberg test stack allows it, please add a test that creates a v3 table with a PUFFIN delete file and runs a SELECT, asserting the expected `NOT_SUPPORTED` error from `IcebergSplitSource`.

Suggested implementation:

```java
        finally {
            dropTable(tableName);
        }
    }

    @Test
    public void testPuffinDeletionVectorsNotSupported()
    {
        String tableName = "test_puffin_deletion_vectors_not_supported";
        try {
            assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
            assertUpdate("INSERT INTO " + tableName + " VALUES (1, 'one'), (2, 'two')", 2);

            Table table = loadTable(tableName);

            // Attach a PUFFIN delete vector to an existing data file in the v3 table
            try (CloseableIterable<FileScanTask> tasks = table.newScan().planFiles()) {
                FileScanTask task = tasks.iterator().next();

                DeleteFile puffinDeleteFile = FileMetadata.deleteFileBuilder(task.spec())
                        .ofPositionDeletes()
                        .withPath(task.file().path().toString() + ".puffin")
                        .withFileSizeInBytes(16)
                        .withFormat(FileFormat.PUFFIN)
                        .build();

                table.newRowDelta()
                        .addDeletes(puffinDeleteFile)
                        .commit();
            }

            assertQueryFails(
                    "SELECT * FROM " + tableName,
                    "PUFFIN deletion vectors.*not supported");
        }
        finally {
            dropTable(tableName);
        }

```

To make this compile and integrate cleanly you will likely need to:

1. Add the following imports at the top of `TestIcebergV3.java` if they are not already present:

```java
import org.apache.iceberg.DeleteFile;
import org.apache.iceberg.FileFormat;
import org.apache.iceberg.FileScanTask;
import org.apache.iceberg.Table;
import org.apache.iceberg.io.CloseableIterable;
import org.apache.iceberg.FileMetadata;
```

2. If `loadTable` is not already accessible in this class (it appears earlier in your snippet, but if it is a helper in a superclass or utility class, ensure it is imported or used correctly), adjust the call accordingly.

3. If your existing assertion helpers differ (e.g., `assertQueryFailsEventually` or a different regex for the NOT_SUPPORTED error), update the `assertQueryFails` call to match the local convention and actual error message thrown by `IcebergSplitSource`. The key behaviour to assert is that planning a SELECT fails when there is a PUFFIN deletion-vector `DeleteFile` in the planned `FileScanTask`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +874 to +875
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The error message for low format versions mentions update mode before it is validated, which can be misleading.

The exception text currently states both a format version and update mode requirement, but this branch only validates the format version; update mode could still be valid. Consider separating these into distinct errors (one for unsupported format version, one for invalid update mode) so users can see exactly which precondition failed.

Suggested implementation:

        Table icebergTable = getIcebergTable(session, icebergTableHandle.getSchemaTableName());
        int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();

        if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    "Iceberg table updates require at least format version 2");
        }

        if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
                .map(mode -> mode.equals(MERGE_ON_READ.modeName()))
                .orElse(false)) {
            throw new PrestoException(
                    ICEBERG_INVALID_METADATA,
                    format("Iceberg table updates require table property %s to be set to %s",
                            TableProperties.UPDATE_MODE,
                            MERGE_ON_READ.modeName()));
        }

    /**

These edits assume:

  1. ICEBERG_INVALID_METADATA is already defined in this class or imported; if not, you should use the existing error code previously used for invalid update mode (or introduce one consistent with the rest of the Iceberg module).
  2. Optional, TableProperties, and MERGE_ON_READ are already imported/available in this file. If any are missing, add the appropriate imports or static imports matching the existing style in the file.

import static org.assertj.core.api.Assertions.assertThatThrownBy;
import static org.testng.Assert.assertEquals;

public class TestIcebergV3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests for rejecting unsupported v3 schema default values and table encryption introduced in validateTableForPresto.

validateTableForPresto now rejects v3 column defaults (initialDefault/writeDefault) and table encryption, but the current tests don’t cover these failure paths. Please add tests that:

  1. Use a v3 table with a column default and assert that INSERT/SELECT fail with the expected NOT_SUPPORTED error.
  2. Use a v3 table with encryption enabled (e.g., via TableMetadata.encryptionKeys/keyId/related properties) and assert that operations fail with the expected error.

This will exercise the new validation logic end-to-end.

Suggested implementation:

import java.util.OptionalInt;

import org.testng.annotations.Test;
    private static final String TEST_SCHEMA = "tpch";
    private static final String V3_COLUMN_DEFAULT_TABLE = "v3_column_default";
    private static final String V3_ENCRYPTED_TABLE = "v3_encrypted";
    @Test
    public void testV3ColumnDefaultsAreRejected()
    {
        assertQueryFails(
                format("SELECT * FROM %s.%s.%s", ICEBERG_CATALOG, TEST_SCHEMA, V3_COLUMN_DEFAULT_TABLE),
                ".*Iceberg v3 column default values are not supported.*");

        assertQueryFails(
                format("INSERT INTO %s.%s.%s VALUES (1)", ICEBERG_CATALOG, TEST_SCHEMA, V3_COLUMN_DEFAULT_TABLE),
                ".*Iceberg v3 column default values are not supported.*");
    }

    @Test
    public void testV3TableEncryptionIsRejected()
    {
        assertQueryFails(
                format("SELECT * FROM %s.%s.%s", ICEBERG_CATALOG, TEST_SCHEMA, V3_ENCRYPTED_TABLE),
                ".*Iceberg table encryption is not supported.*");

        assertQueryFails(
                format("INSERT INTO %s.%s.%s VALUES (1)", ICEBERG_CATALOG, TEST_SCHEMA, V3_ENCRYPTED_TABLE),
                ".*Iceberg table encryption is not supported.*");
    }
}

To fully exercise validateTableForPresto end-to-end, you’ll also need to:

  1. Create the v3 table with column defaults in the same warehouse used by the Hadoop catalog:

    • Use the Iceberg Java APIs (e.g., HadoopCatalog or TableOperations directly) pointing at getIcebergDataDirectoryPath(queryRunner, TEST_SCHEMA) with TableProperties.FORMAT_VERSION set to 3.
    • Define at least one column with initialDefault/writeDefault in the table’s metadata (either via the public API if available in your Iceberg version or by manipulating the metadata JSON to include the appropriate default value fields).
    • Ensure the table is created in the TEST_SCHEMA namespace with name V3_COLUMN_DEFAULT_TABLE so that Presto sees it as iceberg.tpch.v3_column_default.
  2. Create the v3 encrypted table similarly:

    • Use a v3 Iceberg table and configure encryption via TableMetadata.encryptionKeys / key ID or the relevant table properties used in your Iceberg version so that the metadata contains an encryption spec.
    • Create it with table name V3_ENCRYPTED_TABLE in schema TEST_SCHEMA so it’s visible as iceberg.tpch.v3_encrypted.
  3. If assertQueryFails has overloads that accept an error code, you may want to tighten the assertions by checking the NOT_SUPPORTED error code in addition to the message regex.

These setup steps will ensure that when the tests run the Presto connector loads the tables, triggers validateTableForPresto, and hits the new NOT_SUPPORTED paths for v3 column defaults and encryption.

}

@Test
public void testCreateV3Table()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a negative test for creating a table with an unsupported Iceberg format version (e.g., 4).

The new MAX_SUPPORTED_FORMAT_VERSION = 3 constant means higher versions should be rejected, and testCreateV3Table only covers the success path. Please add a test that attempts to create a table with "format-version" = '4' (or >3) and asserts that it fails with the expected error message, so we lock in this behavior and ensure users see a clear error when using unsupported versions.

Suggested implementation:

    @Test
    public void testCreateV3Table()
    {
        String tableName = "test_create_v3_table";
        try {
            assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
            Table table = loadTable(tableName);
            assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
            assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
        }
        finally {
            dropTable(tableName);
        }
    }

    @Test
    public void testCreateUnsupportedFormatVersion()
    {
        String tableName = "test_create_v4_table";
        // Ensure clean state in case a previous run created the table
        dropTable(tableName);

        assertQueryFails(
                "CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '4')",
                ".*format version.*4.*not supported.*");
    }

The exact error message pattern in the assertQueryFails assertion may need to be adjusted to match the actual exception text thrown when MAX_SUPPORTED_FORMAT_VERSION = 3 is exceeded. After running the test suite once, update the regex string to align precisely with the real error message if it differs.

}

@Test
public void testInsertIntoV3Table()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test to cover rejection of PUFFIN deletion vectors in scan planning.

The new logic throws a PrestoException when a FileScanTask has PUFFIN deletion vectors (deleteFile.format() == PUFFIN), but this scenario isn’t exercised by the current tests. If the Iceberg test stack allows it, please add a test that creates a v3 table with a PUFFIN delete file and runs a SELECT, asserting the expected NOT_SUPPORTED error from IcebergSplitSource.

Suggested implementation:

        finally {
            dropTable(tableName);
        }
    }

    @Test
    public void testPuffinDeletionVectorsNotSupported()
    {
        String tableName = "test_puffin_deletion_vectors_not_supported";
        try {
            assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
            assertUpdate("INSERT INTO " + tableName + " VALUES (1, 'one'), (2, 'two')", 2);

            Table table = loadTable(tableName);

            // Attach a PUFFIN delete vector to an existing data file in the v3 table
            try (CloseableIterable<FileScanTask> tasks = table.newScan().planFiles()) {
                FileScanTask task = tasks.iterator().next();

                DeleteFile puffinDeleteFile = FileMetadata.deleteFileBuilder(task.spec())
                        .ofPositionDeletes()
                        .withPath(task.file().path().toString() + ".puffin")
                        .withFileSizeInBytes(16)
                        .withFormat(FileFormat.PUFFIN)
                        .build();

                table.newRowDelta()
                        .addDeletes(puffinDeleteFile)
                        .commit();
            }

            assertQueryFails(
                    "SELECT * FROM " + tableName,
                    "PUFFIN deletion vectors.*not supported");
        }
        finally {
            dropTable(tableName);
        }

To make this compile and integrate cleanly you will likely need to:

  1. Add the following imports at the top of TestIcebergV3.java if they are not already present:
import org.apache.iceberg.DeleteFile;
import org.apache.iceberg.FileFormat;
import org.apache.iceberg.FileScanTask;
import org.apache.iceberg.Table;
import org.apache.iceberg.io.CloseableIterable;
import org.apache.iceberg.FileMetadata;
  1. If loadTable is not already accessible in this class (it appears earlier in your snippet, but if it is a helper in a superclass or utility class, ensure it is imported or used correctly), adjust the call accordingly.

  2. If your existing assertion helpers differ (e.g., assertQueryFailsEventually or a different regex for the NOT_SUPPORTED error), update the assertQueryFails call to match the local convention and actual error message thrown by IcebergSplitSource. The key behaviour to assert is that planning a SELECT fails when there is a PUFFIN deletion-vector DeleteFile in the planned FileScanTask.

@Joe-Abraham Joe-Abraham marked this pull request as draft February 18, 2026 09:17
@Joe-Abraham Joe-Abraham changed the title feat: Add Iceberg format version 3 table support feat: Add initial support for Iceberg format version 3 Feb 18, 2026
@Joe-Abraham Joe-Abraham marked this pull request as ready for review February 18, 2026 12:31
@prestodb-ci prestodb-ci requested a review from a team February 18, 2026 12:31
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The newly introduced MAX_SUPPORTED_FORMAT_VERSION constant is not used anywhere in the changes; either enforce it (e.g., in table creation / loading paths) or remove it to avoid dead configuration that can drift from actual behavior.
  • The format-version guardrails for row-level operations use different exception types and messages between beginMerge, beginUpdate, beginDelete, and OPTIMIZE (e.g., ICEBERG_INVALID_FORMAT_VERSION vs NOT_SUPPORTED and slightly different texts); consider standardizing the error codes and wording so users see consistent behavior across all operations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The newly introduced MAX_SUPPORTED_FORMAT_VERSION constant is not used anywhere in the changes; either enforce it (e.g., in table creation / loading paths) or remove it to avoid dead configuration that can drift from actual behavior.
- The format-version guardrails for row-level operations use different exception types and messages between beginMerge, beginUpdate, beginDelete, and OPTIMIZE (e.g., ICEBERG_INVALID_FORMAT_VERSION vs NOT_SUPPORTED and slightly different texts); consider standardizing the error codes and wording so users see consistent behavior across all operations.

## Individual Comments

### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:382-383` </location>
<code_context>
+            schema = metadata.schema();
+        }
+
+        // Reject schema default values (initial-default / write-default)
+        for (Types.NestedField field : schema.columns()) {
+            if (field.initialDefault() != null || field.writeDefault() != null) {
+                throw new PrestoException(NOT_SUPPORTED, "Iceberg v3 column default values are not supported");
</code_context>

<issue_to_address>
**suggestion:** Column default validation only considers top-level columns and may miss nested fields

`schema.columns()` only iterates top-level fields, so defaults on nested struct fields (if Iceberg permits `initialDefault` / `writeDefault` there) would bypass this check. Either traverse the full schema recursively to validate nested `Types.NestedField`s, or explicitly confirm/document that defaults are only allowed on top-level fields.
</issue_to_address>

### Comment 2
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/procedure/RewriteDataFilesProcedure.java:129-131` </location>
<code_context>
             Table icebergTable = procedureContext.getTable();
             IcebergTableHandle tableHandle = layoutHandle.getTable();

+            // Validate format version for OPTIMIZE operation
+            int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
+            if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
+                throw new PrestoException(NOT_SUPPORTED,
+                        format("OPTIMIZE is not supported for Iceberg table format version > %d. Table %s format version is %s.",
</code_context>

<issue_to_address>
**suggestion:** Reusing MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS for OPTIMIZE may conflate different capability boundaries

Since OPTIMIZE/rewrite-data-files is not actually a row-level operation, tying it to `MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS` couples two unrelated capability checks. If the restriction on OPTIMIZE is driven by other concerns (e.g., manifest/delete semantics), consider introducing a separate constant or helper so future changes to row-level support don’t inadvertently change OPTIMIZE behavior.

Suggested implementation:

```java

```

```java
            // Validate format version for OPTIMIZE operation
            int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
            if (formatVersion > MAX_FORMAT_VERSION_FOR_OPTIMIZE) {
                throw new PrestoException(NOT_SUPPORTED,
                        format("OPTIMIZE is not supported for Iceberg table format version > %d. Table %s format version is %s.",
                                MAX_FORMAT_VERSION_FOR_OPTIMIZE,
                                icebergTable.name(),
                                formatVersion));
            }

```

1. Inside the `RewriteDataFilesProcedure` class, add a dedicated constant for OPTIMIZE, for example near the top of the class:
   ```java
   // Separate from MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS to allow independent evolution
   private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = 2;
   ```
   Place this inside the class body (after the opening `{`), not at the top-level.

2. If you want to avoid hardcoding `2`, you can initialize it from `IcebergUtil.MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS` today:
   ```java
   private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS;
   ```
   but in that case keep the `MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS` import and add a comment explaining that OPTIMIZE is conceptually separate and may diverge in the future.
</issue_to_address>

### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:71-80` </location>
<code_context>
+        }
+    }
+
+    @Test
+    public void testV3TableEncryptionNotSupported()
+    {
+        String tableName = "test_v3_encrypted";
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests that validate rejection of v3 column default values (initialDefault/writeDefault) on both read and write paths

`validateTableForPresto` now rejects `initialDefault` and `writeDefault` for v3 schemas, but this behavior isn’t covered by tests. Please add a test similar to `testV3TableEncryptionNotSupported` that:

1. Creates a v3 table and alters a column to set `initialDefault` and/or `writeDefault` via the Iceberg API.
2. Asserts that `SELECT` fails with the expected `NOT_SUPPORTED` message.
3. Asserts that `INSERT` also fails with the same message.

This will confirm both `getTableHandle` (read) and `beginInsert` (write) correctly enforce the restriction.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +382 to +383
// Reject schema default values (initial-default / write-default)
for (Types.NestedField field : schema.columns()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Column default validation only considers top-level columns and may miss nested fields

schema.columns() only iterates top-level fields, so defaults on nested struct fields (if Iceberg permits initialDefault / writeDefault there) would bypass this check. Either traverse the full schema recursively to validate nested Types.NestedFields, or explicitly confirm/document that defaults are only allowed on top-level fields.

Comment on lines +129 to +131
// Validate format version for OPTIMIZE operation
int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Reusing MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS for OPTIMIZE may conflate different capability boundaries

Since OPTIMIZE/rewrite-data-files is not actually a row-level operation, tying it to MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS couples two unrelated capability checks. If the restriction on OPTIMIZE is driven by other concerns (e.g., manifest/delete semantics), consider introducing a separate constant or helper so future changes to row-level support don’t inadvertently change OPTIMIZE behavior.

Suggested implementation:

            // Validate format version for OPTIMIZE operation
            int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
            if (formatVersion > MAX_FORMAT_VERSION_FOR_OPTIMIZE) {
                throw new PrestoException(NOT_SUPPORTED,
                        format("OPTIMIZE is not supported for Iceberg table format version > %d. Table %s format version is %s.",
                                MAX_FORMAT_VERSION_FOR_OPTIMIZE,
                                icebergTable.name(),
                                formatVersion));
            }
  1. Inside the RewriteDataFilesProcedure class, add a dedicated constant for OPTIMIZE, for example near the top of the class:

    // Separate from MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS to allow independent evolution
    private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = 2;

    Place this inside the class body (after the opening {), not at the top-level.

  2. If you want to avoid hardcoding 2, you can initialize it from IcebergUtil.MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS today:

    private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS;

    but in that case keep the MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS import and add a comment explaining that OPTIMIZE is conceptually separate and may diverge in the future.

Comment on lines +71 to +80
@Test
public void testCreateV3Table()
{
String tableName = "test_create_v3_table";
try {
assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
Table table = loadTable(tableName);
assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests that validate rejection of v3 column default values (initialDefault/writeDefault) on both read and write paths

validateTableForPresto now rejects initialDefault and writeDefault for v3 schemas, but this behavior isn’t covered by tests. Please add a test similar to testV3TableEncryptionNotSupported that:

  1. Creates a v3 table and alters a column to set initialDefault and/or writeDefault via the Iceberg API.
  2. Asserts that SELECT fails with the expected NOT_SUPPORTED message.
  3. Asserts that INSERT also fails with the same message.

This will confirm both getTableHandle (read) and beginInsert (write) correctly enforce the restriction.

@steveburnett
Copy link
Contributor

Revise the release note entry to follow the Release Notes Guidelines to pass the not required for merge but failing test.

For example, the Order of Changes requires entries to start with a listed keyword.

Maybe something like this:

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for creating Iceberg tables with format-version = '3'.
* Add reading from Iceberg V3 tables, including partitioned tables.
* Add INSERT operations into Iceberg V3 tables.
* Add support for upgrading existing V2 tables to V3 using the Iceberg API.

Most of the rest of the release note entry I've removed from this suggested draft should be added to the Iceberg documentation, along with doc for these items remaining here. The release note entry can link to the documentation. Examples of linking are in Formatting in the Release Notes Guidelines.

@Joe-Abraham
Copy link
Contributor Author

@steveburnett, I have updated release notes. Thanks a lot for your feedback.

@steveburnett
Copy link
Contributor

@steveburnett, I have updated release notes. Thanks a lot for your feedback.

Happy to help!

tl;dr: Part of why I spend time on this topic is that in every Presto release we have a PR that automatically collects the release note entries from all merged PRs in the release. The release note doc PR is a release blocker because it must be finished and merged into the docs for the Presto release to happen, so the better the release note entries are when the PR is created, the less work needs to be done on them and the Presto release process goes faster.

@Joe-Abraham Joe-Abraham marked this pull request as draft February 19, 2026 09:35
@Joe-Abraham Joe-Abraham marked this pull request as ready for review February 20, 2026 12:02
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The format-version guardrails (min/max checks and update-mode validation) are now duplicated across beginMerge, beginUpdate, beginDelete, and the OPTIMIZE procedure; consider extracting a shared helper (e.g., validateRowLevelOperationSupported(Table) / validateFormatVersionForRowLevelOperations(int)) to centralize this logic and keep the behavior/messages consistent as support evolves.
  • In validateTableForPresto, the column default check only inspects top-level schema.columns(); if Iceberg can express defaults on nested fields, you may want to recurse into nested types so that unsupported defaults are rejected consistently.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The format-version guardrails (min/max checks and update-mode validation) are now duplicated across beginMerge, beginUpdate, beginDelete, and the OPTIMIZE procedure; consider extracting a shared helper (e.g., `validateRowLevelOperationSupported(Table)` / `validateFormatVersionForRowLevelOperations(int)`) to centralize this logic and keep the behavior/messages consistent as support evolves.
- In `validateTableForPresto`, the column default check only inspects top-level `schema.columns()`; if Iceberg can express defaults on nested fields, you may want to recurse into nested types so that unsupported defaults are rejected consistently.

## Individual Comments

### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:370-371` </location>
<code_context>
+                    .map(table::snapshot)
+                    .orElse(table.currentSnapshot());
+        }
+        catch (RuntimeException e) {
+            // If the snapshot cannot be retrieved (e.g. metadata is missing), we cannot validate the table.
+            // Returning here allows operations that do not strictly require the snapshot (like DROP TABLE) to proceed.
+            return;
</code_context>

<issue_to_address>
**issue (bug_risk):** Catching RuntimeException broadly here may hide unexpected failures during snapshot retrieval.

Swallowing all RuntimeExceptions here will also hide programming/configuration errors that should surface or at least be observable. Consider catching a narrower, Iceberg-specific exception for the expected failure mode, or at minimum log the exception before returning so operational debugging remains possible.
</issue_to_address>

### Comment 2
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:879-888` </location>
<code_context>
-                !Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
-                        .map(mode -> mode.equals(MERGE_ON_READ.modeName()))
-                        .orElse(false)) {
+        if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
+            throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION,
+                    "Iceberg table updates require at least format version 2 and update mode must be merge-on-read");
</code_context>

<issue_to_address>
**suggestion:** Error codes/messages for unsupported format versions are inconsistent between beginMerge and beginUpdate.

`beginMerge` uses `ICEBERG_INVALID_FORMAT_VERSION` for format versions < 2, while this path uses `NOT_SUPPORTED`, and the messages differ in whether they mention update mode. Please align the exception type and message with the other row-level operations (merge/update/delete) for consistent, predictable behavior and easier programmatic handling.

Suggested implementation:

```java
        if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    format(
                            "Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
                            MIN_FORMAT_VERSION_FOR_DELETE,
                            MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
                            formatVersion));
        }
        if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    format(
                            "Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
                            MIN_FORMAT_VERSION_FOR_DELETE,
                            MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
                            formatVersion));
        }

```

To fully align messages and error codes across all row-level operations (merge/update/delete), you should:
1. Locate the corresponding format-version checks in `beginMerge`, `beginDelete`, and any other row-level operation helpers in `IcebergAbstractMetadata` (or related classes).
2. Update their `PrestoException` construction to use:
   - The same `ICEBERG_INVALID_FORMAT_VERSION` error code.
   - The same message template:
     `"Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s"`.
   - The same arguments: `MIN_FORMAT_VERSION_FOR_DELETE`, `MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS`, `formatVersion`.
This will ensure consistent, predictable behavior and make programmatic handling of unsupported format versions easier across all operations.
</issue_to_address>

### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:71-72` </location>
<code_context>
+        assertQuerySucceeds("DROP TABLE IF EXISTS " + tableName);
+    }
+
+    @Test
+    public void testCreateV3Table()
+    {
+        String tableName = "test_create_v3_table";
</code_context>

<issue_to_address>
**suggestion (testing):** Add a test that verifies v3 column default values are rejected

The new `validateTableForPresto` logic that rejects v3 fields with `initialDefault` or `writeDefault` isn’t covered by tests. Please add a test that creates a v3 table, evolves the schema via `Table.updateSchema()` to add an `initialDefault`/`writeDefault`, commits it, and then verifies that both `SELECT` and `INSERT` in Presto fail with the expected NOT_SUPPORTED error, confirming the guardrail works end-to-end.

Suggested implementation:

```java
    @Test
    public void testCreateV3Table()
    {
        String tableName = "test_create_v3_table";
        try {
            assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
            Table table = loadTable(tableName);
            assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
            assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
        }
        finally {
            dropTable(tableName);
        }
    }

    @Test
    public void testV3ColumnDefaultValuesRejected()
    {
        String tableName = "test_v3_column_default_values_rejected";
        try {
            // Create a v3 table
            assertUpdate("CREATE TABLE " + tableName + " (id integer) WITH (\"format-version\" = '3')");

            Table table = loadTable(tableName);
            assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);

            // Evolve the schema to add a column with a v3 default value (initialDefault/writeDefault)
            // so that validateTableForPresto should reject it.
            table.updateSchema()
                    .addColumn("v", Types.IntegerType.get(), "value with default", 42)
                    .commit();

            // Both SELECT and INSERT should fail with the NOT_SUPPORTED error due to v3 defaults.
            assertQueryFails("SELECT * FROM " + tableName, ".*NOT_SUPPORTED.*");
            assertQueryFails("INSERT INTO " + tableName + " (id) VALUES (1)", ".*NOT_SUPPORTED.*");
        }
        finally {
            dropTable(tableName);
        }
    }

```

Depending on the existing imports and Iceberg version in this module, you may need to:

1. Ensure `org.apache.iceberg.types.Types` is imported at the top of `TestIcebergV3.java`:
   - Add `import org.apache.iceberg.types.Types;` if it is not already present.
2. Adjust the `updateSchema()` call if your Iceberg version exposes a different API for setting v3 default values:
   - For example, if the signature is `addColumn(String name, Type type)` plus a fluent method like `.withInitialDefault(...)` or `.withDefault(...)`, replace the `addColumn("v", Types.IntegerType.get(), "value with default", 42)` chain with the appropriate sequence that results in a column having `initialDefault` and/or `writeDefault` set in the schema.
3. If your test base class uses a different helper to assert error codes (e.g., `assertQueryFailsWithErrorCodeMatching`), you can tighten the assertions to check specifically for the `NOT_SUPPORTED` error code instead of matching `"NOT_SUPPORTED"` in the failure message.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +370 to +371
catch (RuntimeException e) {
// If the snapshot cannot be retrieved (e.g. metadata is missing), we cannot validate the table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Catching RuntimeException broadly here may hide unexpected failures during snapshot retrieval.

Swallowing all RuntimeExceptions here will also hide programming/configuration errors that should surface or at least be observable. Consider catching a narrower, Iceberg-specific exception for the expected failure mode, or at minimum log the exception before returning so operational debugging remains possible.

Comment on lines +879 to +888
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION,
"Iceberg table updates require at least format version 2 and update mode must be merge-on-read");
}
if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
throw new PrestoException(NOT_SUPPORTED,
format("Iceberg table updates for format version %s are not supported yet", formatVersion));
}
if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
.map(mode -> mode.equals(MERGE_ON_READ.modeName()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Error codes/messages for unsupported format versions are inconsistent between beginMerge and beginUpdate.

beginMerge uses ICEBERG_INVALID_FORMAT_VERSION for format versions < 2, while this path uses NOT_SUPPORTED, and the messages differ in whether they mention update mode. Please align the exception type and message with the other row-level operations (merge/update/delete) for consistent, predictable behavior and easier programmatic handling.

Suggested implementation:

        if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    format(
                            "Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
                            MIN_FORMAT_VERSION_FOR_DELETE,
                            MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
                            formatVersion));
        }
        if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
            throw new PrestoException(
                    ICEBERG_INVALID_FORMAT_VERSION,
                    format(
                            "Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
                            MIN_FORMAT_VERSION_FOR_DELETE,
                            MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
                            formatVersion));
        }

To fully align messages and error codes across all row-level operations (merge/update/delete), you should:

  1. Locate the corresponding format-version checks in beginMerge, beginDelete, and any other row-level operation helpers in IcebergAbstractMetadata (or related classes).
  2. Update their PrestoException construction to use:
    • The same ICEBERG_INVALID_FORMAT_VERSION error code.
    • The same message template:
      "Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s".
    • The same arguments: MIN_FORMAT_VERSION_FOR_DELETE, MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS, formatVersion.
      This will ensure consistent, predictable behavior and make programmatic handling of unsupported format versions easier across all operations.

Comment on lines +71 to +72
@Test
public void testCreateV3Table()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test that verifies v3 column default values are rejected

The new validateTableForPresto logic that rejects v3 fields with initialDefault or writeDefault isn’t covered by tests. Please add a test that creates a v3 table, evolves the schema via Table.updateSchema() to add an initialDefault/writeDefault, commits it, and then verifies that both SELECT and INSERT in Presto fail with the expected NOT_SUPPORTED error, confirming the guardrail works end-to-end.

Suggested implementation:

    @Test
    public void testCreateV3Table()
    {
        String tableName = "test_create_v3_table";
        try {
            assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
            Table table = loadTable(tableName);
            assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
            assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
        }
        finally {
            dropTable(tableName);
        }
    }

    @Test
    public void testV3ColumnDefaultValuesRejected()
    {
        String tableName = "test_v3_column_default_values_rejected";
        try {
            // Create a v3 table
            assertUpdate("CREATE TABLE " + tableName + " (id integer) WITH (\"format-version\" = '3')");

            Table table = loadTable(tableName);
            assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);

            // Evolve the schema to add a column with a v3 default value (initialDefault/writeDefault)
            // so that validateTableForPresto should reject it.
            table.updateSchema()
                    .addColumn("v", Types.IntegerType.get(), "value with default", 42)
                    .commit();

            // Both SELECT and INSERT should fail with the NOT_SUPPORTED error due to v3 defaults.
            assertQueryFails("SELECT * FROM " + tableName, ".*NOT_SUPPORTED.*");
            assertQueryFails("INSERT INTO " + tableName + " (id) VALUES (1)", ".*NOT_SUPPORTED.*");
        }
        finally {
            dropTable(tableName);
        }
    }

Depending on the existing imports and Iceberg version in this module, you may need to:

  1. Ensure org.apache.iceberg.types.Types is imported at the top of TestIcebergV3.java:
    • Add import org.apache.iceberg.types.Types; if it is not already present.
  2. Adjust the updateSchema() call if your Iceberg version exposes a different API for setting v3 default values:
    • For example, if the signature is addColumn(String name, Type type) plus a fluent method like .withInitialDefault(...) or .withDefault(...), replace the addColumn("v", Types.IntegerType.get(), "value with default", 42) chain with the appropriate sequence that results in a column having initialDefault and/or writeDefault set in the schema.
  3. If your test base class uses a different helper to assert error codes (e.g., assertQueryFailsWithErrorCodeMatching), you can tighten the assertions to check specifically for the NOT_SUPPORTED error code instead of matching "NOT_SUPPORTED" in the failure message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants