feat: Add initial support for Iceberg format version 3#27021
feat: Add initial support for Iceberg format version 3#27021Joe-Abraham wants to merge 1 commit intoprestodb:masterfrom
Conversation
579fa72 to
0c2ed1b
Compare
Could we consider adding documentation for these? The second point "Add handling of unsupported v3 features by explicitly rejecting them" would just need that statement. The first - creating Iceberg format version 3 tables, upgrading v2 tables to v3, inserting into v3 tables - do any of these need examples added to the Iceberg doc? Doesn't have to be a big thing, something like "Insert into an Iceberg v3 table following this example:" and a code block. |
3b86f68 to
30d1c7a
Compare
7b0c149 to
7dac029
Compare
Reviewer's GuideAdds guarded support for Iceberg table format version 3: allows create/read/insert/partitioning and upgrades from v2 while enforcing strict format/version guardrails, rejecting unsupported v3 features (encryption, column defaults, PUFFIN deletion vectors) and disallowing v3 row-level operations and OPTIMIZE, backed by a dedicated TestIcebergV3 suite. Sequence diagram for Iceberg v3 table insert validationsequenceDiagram
actor User
participant PrestoCoordinator
participant IcebergAbstractMetadata
participant IcebergCatalog
participant BaseTable
User->>PrestoCoordinator: SQL INSERT INTO iceberg_v3_table
PrestoCoordinator->>IcebergAbstractMetadata: beginInsert(session, tableHandle)
IcebergAbstractMetadata->>IcebergCatalog: getIcebergTable(session, schemaTableName)
IcebergCatalog-->>IcebergAbstractMetadata: Table icebergTable
IcebergAbstractMetadata->>IcebergAbstractMetadata: validateTableMode(session, icebergTable)
IcebergAbstractMetadata->>BaseTable: cast icebergTable to BaseTable
IcebergAbstractMetadata->>BaseTable: currentSnapshot()
BaseTable-->>IcebergAbstractMetadata: Snapshot snapshot
IcebergAbstractMetadata->>IcebergAbstractMetadata: validateTableForPresto(baseTable, Optional.of(snapshotId))
alt metadata.formatVersion < 3 or empty table
IcebergAbstractMetadata-->>IcebergAbstractMetadata: Return without extra checks
IcebergAbstractMetadata-->>PrestoCoordinator: beginIcebergTableInsert(...)
PrestoCoordinator-->>User: INSERT succeeds
else metadata.formatVersion >= 3
IcebergAbstractMetadata->>BaseTable: operations().current()
IcebergAbstractMetadata->>BaseTable: schemasById().get(snapshot.schemaId()) or schema()
IcebergAbstractMetadata->>IcebergAbstractMetadata: check columns for default values
alt any column has initialDefault or writeDefault
IcebergAbstractMetadata-->>PrestoCoordinator: PrestoException NOT_SUPPORTED (column defaults)
PrestoCoordinator-->>User: INSERT fails (v3 column defaults not supported)
else no column defaults
IcebergAbstractMetadata->>IcebergAbstractMetadata: check encryptionKeys, snapshot.keyId, properties[encryption.key-id]
alt any encryption present
IcebergAbstractMetadata-->>PrestoCoordinator: PrestoException NOT_SUPPORTED (encryption)
PrestoCoordinator-->>User: INSERT fails (encryption not supported)
else no encryption
IcebergAbstractMetadata-->>PrestoCoordinator: beginIcebergTableInsert(...)
PrestoCoordinator-->>User: INSERT succeeds
end
end
end
Class diagram for Iceberg v3 guardrails and validationclassDiagram
class IcebergUtil {
<<utility>>
+int MIN_FORMAT_VERSION_FOR_DELETE
+int MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS
+int MAX_SUPPORTED_FORMAT_VERSION
+int parseFormatVersion(String formatVersion)
}
class IcebergAbstractMetadata {
+Optional getProcedureContext()
+ConnectorInsertTableHandle beginInsert(ConnectorSession session, ConnectorTableHandle tableHandle)
+IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableName tableName)
+ConnectorMergeTableHandle beginMerge(ConnectorSession session, ConnectorTableHandle tableHandle)
+ConnectorDeleteTableHandle beginDelete(ConnectorSession session, ConnectorTableHandle tableHandle)
+ConnectorTableHandle beginUpdate(ConnectorSession session, ConnectorTableHandle tableHandle)
+void validateTableMode(ConnectorSession session, Table icebergTable)
+static void validateTableForPresto(BaseTable table, Optional tableSnapshotId)
}
class RewriteDataFilesProcedure {
+ConnectorDistributedProcedureHandle beginCallDistributedProcedure(ConnectorSession session, ConnectorTransactionHandle transaction, ConnectorTableHandle layoutHandle)
}
class IcebergSplitSource {
-ConnectorSplit toIcebergSplit(FileScanTask task)
}
class BaseTable {
+TableOperations operations()
+Snapshot currentSnapshot()
+Snapshot snapshot(long snapshotId)
+String name()
}
class TableMetadata {
+int formatVersion()
+Schema schema()
+Map schemasById()
+Map encryptionKeys()
+Map properties()
}
class Snapshot {
+long snapshotId()
+int schemaId()
+String keyId()
}
class FileScanTask {
+PartitionSpec spec()
+DataFile file()
+List deletes()
}
class DeleteFile {
+FileFormat format()
}
class PrestoException {
+PrestoException(ErrorCode errorCode, String message)
}
class ErrorCodes {
<<enumeration>>
NOT_SUPPORTED
ICEBERG_INVALID_FORMAT_VERSION
}
IcebergAbstractMetadata --> IcebergUtil : uses constants
IcebergAbstractMetadata --> BaseTable : casts Table to
IcebergAbstractMetadata --> TableMetadata : reads metadata
IcebergAbstractMetadata --> Snapshot : validates snapshot
RewriteDataFilesProcedure --> IcebergUtil : uses MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS
RewriteDataFilesProcedure --> BaseTable : formatVersion from operations
IcebergSplitSource --> FileScanTask : reads task
IcebergSplitSource --> DeleteFile : iterates deletes
DeleteFile --> PrestoException : throws when format is PUFFIN
IcebergUtil --> PrestoException : throws when version unsupported
IcebergAbstractMetadata --> PrestoException : throws for column defaults
IcebergAbstractMetadata --> PrestoException : throws for encryption
IcebergAbstractMetadata --> PrestoException : throws for invalid format version
IcebergAbstractMetadata --> PrestoException : throws for unsupported row-level ops
RewriteDataFilesProcedure --> PrestoException : throws when OPTIMIZE on unsupported version
IcebergSplitSource --> PrestoException : throws when PUFFIN deletion vectors used
PrestoException --> ErrorCodes : uses error codes
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The new constant MAX_SUPPORTED_FORMAT_VERSION is introduced but not used anywhere in this change; either wire it into the relevant validation paths or remove it to avoid confusion.
- The validation and error handling for format versions and update/delete modes (e.g., in beginMerge, beginDelete, beginUpdate, and OPTIMIZE) are very similar but use slightly different exception types/messages; consider centralizing this logic in a shared helper to keep behavior consistent and easier to maintain.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new constant MAX_SUPPORTED_FORMAT_VERSION is introduced but not used anywhere in this change; either wire it into the relevant validation paths or remove it to avoid confusion.
- The validation and error handling for format versions and update/delete modes (e.g., in beginMerge, beginDelete, beginUpdate, and OPTIMIZE) are very similar but use slightly different exception types/messages; consider centralizing this logic in a shared helper to keep behavior consistent and easier to maintain.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:874-875` </location>
<code_context>
- !Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
- .map(mode -> mode.equals(MERGE_ON_READ.modeName()))
- .orElse(false)) {
+ if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
+ throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION,
+ "Iceberg table updates require at least format version 2 and update mode must be merge-on-read");
+ }
</code_context>
<issue_to_address>
**suggestion:** The error message for low format versions mentions update mode before it is validated, which can be misleading.
The exception text currently states both a format version and update mode requirement, but this branch only validates the format version; update mode could still be valid. Consider separating these into distinct errors (one for unsupported format version, one for invalid update mode) so users can see exactly which precondition failed.
Suggested implementation:
```java
Table icebergTable = getIcebergTable(session, icebergTableHandle.getSchemaTableName());
int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(
ICEBERG_INVALID_FORMAT_VERSION,
"Iceberg table updates require at least format version 2");
}
if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
.map(mode -> mode.equals(MERGE_ON_READ.modeName()))
.orElse(false)) {
throw new PrestoException(
ICEBERG_INVALID_METADATA,
format("Iceberg table updates require table property %s to be set to %s",
TableProperties.UPDATE_MODE,
MERGE_ON_READ.modeName()));
}
/**
```
These edits assume:
1. `ICEBERG_INVALID_METADATA` is already defined in this class or imported; if not, you should use the existing error code previously used for invalid update mode (or introduce one consistent with the rest of the Iceberg module).
2. `Optional`, `TableProperties`, and `MERGE_ON_READ` are already imported/available in this file. If any are missing, add the appropriate imports or static imports matching the existing style in the file.
</issue_to_address>
### Comment 2
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:43` </location>
<code_context>
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+import static org.testng.Assert.assertEquals;
+
+public class TestIcebergV3
+ extends AbstractTestQueryFramework
+{
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for rejecting unsupported v3 schema default values and table encryption introduced in `validateTableForPresto`.
`validateTableForPresto` now rejects v3 column defaults (`initialDefault`/`writeDefault`) and table encryption, but the current tests don’t cover these failure paths. Please add tests that:
1. Use a v3 table with a column default and assert that INSERT/SELECT fail with the expected `NOT_SUPPORTED` error.
2. Use a v3 table with encryption enabled (e.g., via `TableMetadata.encryptionKeys`/`keyId`/related properties) and assert that operations fail with the expected error.
This will exercise the new validation logic end-to-end.
Suggested implementation:
```java
import java.util.OptionalInt;
import org.testng.annotations.Test;
```
```java
private static final String TEST_SCHEMA = "tpch";
private static final String V3_COLUMN_DEFAULT_TABLE = "v3_column_default";
private static final String V3_ENCRYPTED_TABLE = "v3_encrypted";
```
```java
@Test
public void testV3ColumnDefaultsAreRejected()
{
assertQueryFails(
format("SELECT * FROM %s.%s.%s", ICEBERG_CATALOG, TEST_SCHEMA, V3_COLUMN_DEFAULT_TABLE),
".*Iceberg v3 column default values are not supported.*");
assertQueryFails(
format("INSERT INTO %s.%s.%s VALUES (1)", ICEBERG_CATALOG, TEST_SCHEMA, V3_COLUMN_DEFAULT_TABLE),
".*Iceberg v3 column default values are not supported.*");
}
@Test
public void testV3TableEncryptionIsRejected()
{
assertQueryFails(
format("SELECT * FROM %s.%s.%s", ICEBERG_CATALOG, TEST_SCHEMA, V3_ENCRYPTED_TABLE),
".*Iceberg table encryption is not supported.*");
assertQueryFails(
format("INSERT INTO %s.%s.%s VALUES (1)", ICEBERG_CATALOG, TEST_SCHEMA, V3_ENCRYPTED_TABLE),
".*Iceberg table encryption is not supported.*");
}
}
```
To fully exercise `validateTableForPresto` end-to-end, you’ll also need to:
1. **Create the v3 table with column defaults** in the same warehouse used by the Hadoop catalog:
- Use the Iceberg Java APIs (e.g., `HadoopCatalog` or `TableOperations` directly) pointing at `getIcebergDataDirectoryPath(queryRunner, TEST_SCHEMA)` with `TableProperties.FORMAT_VERSION` set to `3`.
- Define at least one column with `initialDefault`/`writeDefault` in the table’s metadata (either via the public API if available in your Iceberg version or by manipulating the metadata JSON to include the appropriate default value fields).
- Ensure the table is created in the `TEST_SCHEMA` namespace with name `V3_COLUMN_DEFAULT_TABLE` so that Presto sees it as `iceberg.tpch.v3_column_default`.
2. **Create the v3 encrypted table** similarly:
- Use a v3 Iceberg table and configure encryption via `TableMetadata.encryptionKeys` / key ID or the relevant table properties used in your Iceberg version so that the metadata contains an encryption spec.
- Create it with table name `V3_ENCRYPTED_TABLE` in schema `TEST_SCHEMA` so it’s visible as `iceberg.tpch.v3_encrypted`.
3. If `assertQueryFails` has overloads that accept an error code, you may want to tighten the assertions by checking the `NOT_SUPPORTED` error code in addition to the message regex.
These setup steps will ensure that when the tests run the Presto connector loads the tables, triggers `validateTableForPresto`, and hits the new NOT_SUPPORTED paths for v3 column defaults and encryption.
</issue_to_address>
### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:67` </location>
<code_context>
+ }
+
+ @Test
+ public void testCreateV3Table()
+ {
+ String tableName = "test_create_v3_table";
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a negative test for creating a table with an unsupported Iceberg format version (e.g., 4).
The new `MAX_SUPPORTED_FORMAT_VERSION = 3` constant means higher versions should be rejected, and `testCreateV3Table` only covers the success path. Please add a test that attempts to create a table with `"format-version" = '4'` (or >3) and asserts that it fails with the expected error message, so we lock in this behavior and ensure users see a clear error when using unsupported versions.
Suggested implementation:
```java
@Test
public void testCreateV3Table()
{
String tableName = "test_create_v3_table";
try {
assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
Table table = loadTable(tableName);
assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
}
finally {
dropTable(tableName);
}
}
@Test
public void testCreateUnsupportedFormatVersion()
{
String tableName = "test_create_v4_table";
// Ensure clean state in case a previous run created the table
dropTable(tableName);
assertQueryFails(
"CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '4')",
".*format version.*4.*not supported.*");
}
```
The exact error message pattern in the `assertQueryFails` assertion may need to be adjusted to match the actual exception text thrown when `MAX_SUPPORTED_FORMAT_VERSION = 3` is exceeded. After running the test suite once, update the regex string to align precisely with the real error message if it differs.
</issue_to_address>
### Comment 4
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:107` </location>
<code_context>
+ }
+
+ @Test
+ public void testInsertIntoV3Table()
+ {
+ String tableName = "test_insert_v3_table";
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test to cover rejection of PUFFIN deletion vectors in scan planning.
The new logic throws a `PrestoException` when a `FileScanTask` has PUFFIN deletion vectors (`deleteFile.format() == PUFFIN`), but this scenario isn’t exercised by the current tests. If the Iceberg test stack allows it, please add a test that creates a v3 table with a PUFFIN delete file and runs a SELECT, asserting the expected `NOT_SUPPORTED` error from `IcebergSplitSource`.
Suggested implementation:
```java
finally {
dropTable(tableName);
}
}
@Test
public void testPuffinDeletionVectorsNotSupported()
{
String tableName = "test_puffin_deletion_vectors_not_supported";
try {
assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
assertUpdate("INSERT INTO " + tableName + " VALUES (1, 'one'), (2, 'two')", 2);
Table table = loadTable(tableName);
// Attach a PUFFIN delete vector to an existing data file in the v3 table
try (CloseableIterable<FileScanTask> tasks = table.newScan().planFiles()) {
FileScanTask task = tasks.iterator().next();
DeleteFile puffinDeleteFile = FileMetadata.deleteFileBuilder(task.spec())
.ofPositionDeletes()
.withPath(task.file().path().toString() + ".puffin")
.withFileSizeInBytes(16)
.withFormat(FileFormat.PUFFIN)
.build();
table.newRowDelta()
.addDeletes(puffinDeleteFile)
.commit();
}
assertQueryFails(
"SELECT * FROM " + tableName,
"PUFFIN deletion vectors.*not supported");
}
finally {
dropTable(tableName);
}
```
To make this compile and integrate cleanly you will likely need to:
1. Add the following imports at the top of `TestIcebergV3.java` if they are not already present:
```java
import org.apache.iceberg.DeleteFile;
import org.apache.iceberg.FileFormat;
import org.apache.iceberg.FileScanTask;
import org.apache.iceberg.Table;
import org.apache.iceberg.io.CloseableIterable;
import org.apache.iceberg.FileMetadata;
```
2. If `loadTable` is not already accessible in this class (it appears earlier in your snippet, but if it is a helper in a superclass or utility class, ensure it is imported or used correctly), adjust the call accordingly.
3. If your existing assertion helpers differ (e.g., `assertQueryFailsEventually` or a different regex for the NOT_SUPPORTED error), update the `assertQueryFails` call to match the local convention and actual error message thrown by `IcebergSplitSource`. The key behaviour to assert is that planning a SELECT fails when there is a PUFFIN deletion-vector `DeleteFile` in the planned `FileScanTask`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) { | ||
| throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION, |
There was a problem hiding this comment.
suggestion: The error message for low format versions mentions update mode before it is validated, which can be misleading.
The exception text currently states both a format version and update mode requirement, but this branch only validates the format version; update mode could still be valid. Consider separating these into distinct errors (one for unsupported format version, one for invalid update mode) so users can see exactly which precondition failed.
Suggested implementation:
Table icebergTable = getIcebergTable(session, icebergTableHandle.getSchemaTableName());
int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(
ICEBERG_INVALID_FORMAT_VERSION,
"Iceberg table updates require at least format version 2");
}
if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
.map(mode -> mode.equals(MERGE_ON_READ.modeName()))
.orElse(false)) {
throw new PrestoException(
ICEBERG_INVALID_METADATA,
format("Iceberg table updates require table property %s to be set to %s",
TableProperties.UPDATE_MODE,
MERGE_ON_READ.modeName()));
}
/**These edits assume:
ICEBERG_INVALID_METADATAis already defined in this class or imported; if not, you should use the existing error code previously used for invalid update mode (or introduce one consistent with the rest of the Iceberg module).Optional,TableProperties, andMERGE_ON_READare already imported/available in this file. If any are missing, add the appropriate imports or static imports matching the existing style in the file.
| import static org.assertj.core.api.Assertions.assertThatThrownBy; | ||
| import static org.testng.Assert.assertEquals; | ||
|
|
||
| public class TestIcebergV3 |
There was a problem hiding this comment.
suggestion (testing): Add tests for rejecting unsupported v3 schema default values and table encryption introduced in validateTableForPresto.
validateTableForPresto now rejects v3 column defaults (initialDefault/writeDefault) and table encryption, but the current tests don’t cover these failure paths. Please add tests that:
- Use a v3 table with a column default and assert that INSERT/SELECT fail with the expected
NOT_SUPPORTEDerror. - Use a v3 table with encryption enabled (e.g., via
TableMetadata.encryptionKeys/keyId/related properties) and assert that operations fail with the expected error.
This will exercise the new validation logic end-to-end.
Suggested implementation:
import java.util.OptionalInt;
import org.testng.annotations.Test; private static final String TEST_SCHEMA = "tpch";
private static final String V3_COLUMN_DEFAULT_TABLE = "v3_column_default";
private static final String V3_ENCRYPTED_TABLE = "v3_encrypted"; @Test
public void testV3ColumnDefaultsAreRejected()
{
assertQueryFails(
format("SELECT * FROM %s.%s.%s", ICEBERG_CATALOG, TEST_SCHEMA, V3_COLUMN_DEFAULT_TABLE),
".*Iceberg v3 column default values are not supported.*");
assertQueryFails(
format("INSERT INTO %s.%s.%s VALUES (1)", ICEBERG_CATALOG, TEST_SCHEMA, V3_COLUMN_DEFAULT_TABLE),
".*Iceberg v3 column default values are not supported.*");
}
@Test
public void testV3TableEncryptionIsRejected()
{
assertQueryFails(
format("SELECT * FROM %s.%s.%s", ICEBERG_CATALOG, TEST_SCHEMA, V3_ENCRYPTED_TABLE),
".*Iceberg table encryption is not supported.*");
assertQueryFails(
format("INSERT INTO %s.%s.%s VALUES (1)", ICEBERG_CATALOG, TEST_SCHEMA, V3_ENCRYPTED_TABLE),
".*Iceberg table encryption is not supported.*");
}
}To fully exercise validateTableForPresto end-to-end, you’ll also need to:
-
Create the v3 table with column defaults in the same warehouse used by the Hadoop catalog:
- Use the Iceberg Java APIs (e.g.,
HadoopCatalogorTableOperationsdirectly) pointing atgetIcebergDataDirectoryPath(queryRunner, TEST_SCHEMA)withTableProperties.FORMAT_VERSIONset to3. - Define at least one column with
initialDefault/writeDefaultin the table’s metadata (either via the public API if available in your Iceberg version or by manipulating the metadata JSON to include the appropriate default value fields). - Ensure the table is created in the
TEST_SCHEMAnamespace with nameV3_COLUMN_DEFAULT_TABLEso that Presto sees it asiceberg.tpch.v3_column_default.
- Use the Iceberg Java APIs (e.g.,
-
Create the v3 encrypted table similarly:
- Use a v3 Iceberg table and configure encryption via
TableMetadata.encryptionKeys/ key ID or the relevant table properties used in your Iceberg version so that the metadata contains an encryption spec. - Create it with table name
V3_ENCRYPTED_TABLEin schemaTEST_SCHEMAso it’s visible asiceberg.tpch.v3_encrypted.
- Use a v3 Iceberg table and configure encryption via
-
If
assertQueryFailshas overloads that accept an error code, you may want to tighten the assertions by checking theNOT_SUPPORTEDerror code in addition to the message regex.
These setup steps will ensure that when the tests run the Presto connector loads the tables, triggers validateTableForPresto, and hits the new NOT_SUPPORTED paths for v3 column defaults and encryption.
| } | ||
|
|
||
| @Test | ||
| public void testCreateV3Table() |
There was a problem hiding this comment.
suggestion (testing): Consider adding a negative test for creating a table with an unsupported Iceberg format version (e.g., 4).
The new MAX_SUPPORTED_FORMAT_VERSION = 3 constant means higher versions should be rejected, and testCreateV3Table only covers the success path. Please add a test that attempts to create a table with "format-version" = '4' (or >3) and asserts that it fails with the expected error message, so we lock in this behavior and ensure users see a clear error when using unsupported versions.
Suggested implementation:
@Test
public void testCreateV3Table()
{
String tableName = "test_create_v3_table";
try {
assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
Table table = loadTable(tableName);
assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
}
finally {
dropTable(tableName);
}
}
@Test
public void testCreateUnsupportedFormatVersion()
{
String tableName = "test_create_v4_table";
// Ensure clean state in case a previous run created the table
dropTable(tableName);
assertQueryFails(
"CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '4')",
".*format version.*4.*not supported.*");
}The exact error message pattern in the assertQueryFails assertion may need to be adjusted to match the actual exception text thrown when MAX_SUPPORTED_FORMAT_VERSION = 3 is exceeded. After running the test suite once, update the regex string to align precisely with the real error message if it differs.
| } | ||
|
|
||
| @Test | ||
| public void testInsertIntoV3Table() |
There was a problem hiding this comment.
suggestion (testing): Add a test to cover rejection of PUFFIN deletion vectors in scan planning.
The new logic throws a PrestoException when a FileScanTask has PUFFIN deletion vectors (deleteFile.format() == PUFFIN), but this scenario isn’t exercised by the current tests. If the Iceberg test stack allows it, please add a test that creates a v3 table with a PUFFIN delete file and runs a SELECT, asserting the expected NOT_SUPPORTED error from IcebergSplitSource.
Suggested implementation:
finally {
dropTable(tableName);
}
}
@Test
public void testPuffinDeletionVectorsNotSupported()
{
String tableName = "test_puffin_deletion_vectors_not_supported";
try {
assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
assertUpdate("INSERT INTO " + tableName + " VALUES (1, 'one'), (2, 'two')", 2);
Table table = loadTable(tableName);
// Attach a PUFFIN delete vector to an existing data file in the v3 table
try (CloseableIterable<FileScanTask> tasks = table.newScan().planFiles()) {
FileScanTask task = tasks.iterator().next();
DeleteFile puffinDeleteFile = FileMetadata.deleteFileBuilder(task.spec())
.ofPositionDeletes()
.withPath(task.file().path().toString() + ".puffin")
.withFileSizeInBytes(16)
.withFormat(FileFormat.PUFFIN)
.build();
table.newRowDelta()
.addDeletes(puffinDeleteFile)
.commit();
}
assertQueryFails(
"SELECT * FROM " + tableName,
"PUFFIN deletion vectors.*not supported");
}
finally {
dropTable(tableName);
}To make this compile and integrate cleanly you will likely need to:
- Add the following imports at the top of
TestIcebergV3.javaif they are not already present:
import org.apache.iceberg.DeleteFile;
import org.apache.iceberg.FileFormat;
import org.apache.iceberg.FileScanTask;
import org.apache.iceberg.Table;
import org.apache.iceberg.io.CloseableIterable;
import org.apache.iceberg.FileMetadata;-
If
loadTableis not already accessible in this class (it appears earlier in your snippet, but if it is a helper in a superclass or utility class, ensure it is imported or used correctly), adjust the call accordingly. -
If your existing assertion helpers differ (e.g.,
assertQueryFailsEventuallyor a different regex for the NOT_SUPPORTED error), update theassertQueryFailscall to match the local convention and actual error message thrown byIcebergSplitSource. The key behaviour to assert is that planning a SELECT fails when there is a PUFFIN deletion-vectorDeleteFilein the plannedFileScanTask.
7dac029 to
223382b
Compare
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The newly introduced MAX_SUPPORTED_FORMAT_VERSION constant is not used anywhere in the changes; either enforce it (e.g., in table creation / loading paths) or remove it to avoid dead configuration that can drift from actual behavior.
- The format-version guardrails for row-level operations use different exception types and messages between beginMerge, beginUpdate, beginDelete, and OPTIMIZE (e.g., ICEBERG_INVALID_FORMAT_VERSION vs NOT_SUPPORTED and slightly different texts); consider standardizing the error codes and wording so users see consistent behavior across all operations.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The newly introduced MAX_SUPPORTED_FORMAT_VERSION constant is not used anywhere in the changes; either enforce it (e.g., in table creation / loading paths) or remove it to avoid dead configuration that can drift from actual behavior.
- The format-version guardrails for row-level operations use different exception types and messages between beginMerge, beginUpdate, beginDelete, and OPTIMIZE (e.g., ICEBERG_INVALID_FORMAT_VERSION vs NOT_SUPPORTED and slightly different texts); consider standardizing the error codes and wording so users see consistent behavior across all operations.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:382-383` </location>
<code_context>
+ schema = metadata.schema();
+ }
+
+ // Reject schema default values (initial-default / write-default)
+ for (Types.NestedField field : schema.columns()) {
+ if (field.initialDefault() != null || field.writeDefault() != null) {
+ throw new PrestoException(NOT_SUPPORTED, "Iceberg v3 column default values are not supported");
</code_context>
<issue_to_address>
**suggestion:** Column default validation only considers top-level columns and may miss nested fields
`schema.columns()` only iterates top-level fields, so defaults on nested struct fields (if Iceberg permits `initialDefault` / `writeDefault` there) would bypass this check. Either traverse the full schema recursively to validate nested `Types.NestedField`s, or explicitly confirm/document that defaults are only allowed on top-level fields.
</issue_to_address>
### Comment 2
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/procedure/RewriteDataFilesProcedure.java:129-131` </location>
<code_context>
Table icebergTable = procedureContext.getTable();
IcebergTableHandle tableHandle = layoutHandle.getTable();
+ // Validate format version for OPTIMIZE operation
+ int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
+ if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
+ throw new PrestoException(NOT_SUPPORTED,
+ format("OPTIMIZE is not supported for Iceberg table format version > %d. Table %s format version is %s.",
</code_context>
<issue_to_address>
**suggestion:** Reusing MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS for OPTIMIZE may conflate different capability boundaries
Since OPTIMIZE/rewrite-data-files is not actually a row-level operation, tying it to `MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS` couples two unrelated capability checks. If the restriction on OPTIMIZE is driven by other concerns (e.g., manifest/delete semantics), consider introducing a separate constant or helper so future changes to row-level support don’t inadvertently change OPTIMIZE behavior.
Suggested implementation:
```java
```
```java
// Validate format version for OPTIMIZE operation
int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
if (formatVersion > MAX_FORMAT_VERSION_FOR_OPTIMIZE) {
throw new PrestoException(NOT_SUPPORTED,
format("OPTIMIZE is not supported for Iceberg table format version > %d. Table %s format version is %s.",
MAX_FORMAT_VERSION_FOR_OPTIMIZE,
icebergTable.name(),
formatVersion));
}
```
1. Inside the `RewriteDataFilesProcedure` class, add a dedicated constant for OPTIMIZE, for example near the top of the class:
```java
// Separate from MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS to allow independent evolution
private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = 2;
```
Place this inside the class body (after the opening `{`), not at the top-level.
2. If you want to avoid hardcoding `2`, you can initialize it from `IcebergUtil.MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS` today:
```java
private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS;
```
but in that case keep the `MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS` import and add a comment explaining that OPTIMIZE is conceptually separate and may diverge in the future.
</issue_to_address>
### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:71-80` </location>
<code_context>
+ }
+ }
+
+ @Test
+ public void testV3TableEncryptionNotSupported()
+ {
+ String tableName = "test_v3_encrypted";
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests that validate rejection of v3 column default values (initialDefault/writeDefault) on both read and write paths
`validateTableForPresto` now rejects `initialDefault` and `writeDefault` for v3 schemas, but this behavior isn’t covered by tests. Please add a test similar to `testV3TableEncryptionNotSupported` that:
1. Creates a v3 table and alters a column to set `initialDefault` and/or `writeDefault` via the Iceberg API.
2. Asserts that `SELECT` fails with the expected `NOT_SUPPORTED` message.
3. Asserts that `INSERT` also fails with the same message.
This will confirm both `getTableHandle` (read) and `beginInsert` (write) correctly enforce the restriction.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Reject schema default values (initial-default / write-default) | ||
| for (Types.NestedField field : schema.columns()) { |
There was a problem hiding this comment.
suggestion: Column default validation only considers top-level columns and may miss nested fields
schema.columns() only iterates top-level fields, so defaults on nested struct fields (if Iceberg permits initialDefault / writeDefault there) would bypass this check. Either traverse the full schema recursively to validate nested Types.NestedFields, or explicitly confirm/document that defaults are only allowed on top-level fields.
| // Validate format version for OPTIMIZE operation | ||
| int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion(); | ||
| if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) { |
There was a problem hiding this comment.
suggestion: Reusing MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS for OPTIMIZE may conflate different capability boundaries
Since OPTIMIZE/rewrite-data-files is not actually a row-level operation, tying it to MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS couples two unrelated capability checks. If the restriction on OPTIMIZE is driven by other concerns (e.g., manifest/delete semantics), consider introducing a separate constant or helper so future changes to row-level support don’t inadvertently change OPTIMIZE behavior.
Suggested implementation:
// Validate format version for OPTIMIZE operation
int formatVersion = ((BaseTable) icebergTable).operations().current().formatVersion();
if (formatVersion > MAX_FORMAT_VERSION_FOR_OPTIMIZE) {
throw new PrestoException(NOT_SUPPORTED,
format("OPTIMIZE is not supported for Iceberg table format version > %d. Table %s format version is %s.",
MAX_FORMAT_VERSION_FOR_OPTIMIZE,
icebergTable.name(),
formatVersion));
}-
Inside the
RewriteDataFilesProcedureclass, add a dedicated constant for OPTIMIZE, for example near the top of the class:// Separate from MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS to allow independent evolution private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = 2;
Place this inside the class body (after the opening
{), not at the top-level. -
If you want to avoid hardcoding
2, you can initialize it fromIcebergUtil.MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONStoday:private static final int MAX_FORMAT_VERSION_FOR_OPTIMIZE = MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS;
but in that case keep the
MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONSimport and add a comment explaining that OPTIMIZE is conceptually separate and may diverge in the future.
| @Test | ||
| public void testCreateV3Table() | ||
| { | ||
| String tableName = "test_create_v3_table"; | ||
| try { | ||
| assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')"); | ||
| Table table = loadTable(tableName); | ||
| assertEquals(((BaseTable) table).operations().current().formatVersion(), 3); | ||
| assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false"); | ||
| } |
There was a problem hiding this comment.
suggestion (testing): Add tests that validate rejection of v3 column default values (initialDefault/writeDefault) on both read and write paths
validateTableForPresto now rejects initialDefault and writeDefault for v3 schemas, but this behavior isn’t covered by tests. Please add a test similar to testV3TableEncryptionNotSupported that:
- Creates a v3 table and alters a column to set
initialDefaultand/orwriteDefaultvia the Iceberg API. - Asserts that
SELECTfails with the expectedNOT_SUPPORTEDmessage. - Asserts that
INSERTalso fails with the same message.
This will confirm both getTableHandle (read) and beginInsert (write) correctly enforce the restriction.
|
Revise the release note entry to follow the Release Notes Guidelines to pass the not required for merge but failing test. For example, the Order of Changes requires entries to start with a listed keyword. Maybe something like this: Most of the rest of the release note entry I've removed from this suggested draft should be added to the Iceberg documentation, along with doc for these items remaining here. The release note entry can link to the documentation. Examples of linking are in Formatting in the Release Notes Guidelines. |
|
@steveburnett, I have updated release notes. Thanks a lot for your feedback. |
Happy to help! tl;dr: Part of why I spend time on this topic is that in every Presto release we have a PR that automatically collects the release note entries from all merged PRs in the release. The release note doc PR is a release blocker because it must be finished and merged into the docs for the Presto release to happen, so the better the release note entries are when the PR is created, the less work needs to be done on them and the Presto release process goes faster. |
d6d9e9e to
4567a3d
Compare
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The format-version guardrails (min/max checks and update-mode validation) are now duplicated across beginMerge, beginUpdate, beginDelete, and the OPTIMIZE procedure; consider extracting a shared helper (e.g.,
validateRowLevelOperationSupported(Table)/validateFormatVersionForRowLevelOperations(int)) to centralize this logic and keep the behavior/messages consistent as support evolves. - In
validateTableForPresto, the column default check only inspects top-levelschema.columns(); if Iceberg can express defaults on nested fields, you may want to recurse into nested types so that unsupported defaults are rejected consistently.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The format-version guardrails (min/max checks and update-mode validation) are now duplicated across beginMerge, beginUpdate, beginDelete, and the OPTIMIZE procedure; consider extracting a shared helper (e.g., `validateRowLevelOperationSupported(Table)` / `validateFormatVersionForRowLevelOperations(int)`) to centralize this logic and keep the behavior/messages consistent as support evolves.
- In `validateTableForPresto`, the column default check only inspects top-level `schema.columns()`; if Iceberg can express defaults on nested fields, you may want to recurse into nested types so that unsupported defaults are rejected consistently.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:370-371` </location>
<code_context>
+ .map(table::snapshot)
+ .orElse(table.currentSnapshot());
+ }
+ catch (RuntimeException e) {
+ // If the snapshot cannot be retrieved (e.g. metadata is missing), we cannot validate the table.
+ // Returning here allows operations that do not strictly require the snapshot (like DROP TABLE) to proceed.
+ return;
</code_context>
<issue_to_address>
**issue (bug_risk):** Catching RuntimeException broadly here may hide unexpected failures during snapshot retrieval.
Swallowing all RuntimeExceptions here will also hide programming/configuration errors that should surface or at least be observable. Consider catching a narrower, Iceberg-specific exception for the expected failure mode, or at minimum log the exception before returning so operational debugging remains possible.
</issue_to_address>
### Comment 2
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:879-888` </location>
<code_context>
- !Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE))
- .map(mode -> mode.equals(MERGE_ON_READ.modeName()))
- .orElse(false)) {
+ if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
+ throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION,
+ "Iceberg table updates require at least format version 2 and update mode must be merge-on-read");
</code_context>
<issue_to_address>
**suggestion:** Error codes/messages for unsupported format versions are inconsistent between beginMerge and beginUpdate.
`beginMerge` uses `ICEBERG_INVALID_FORMAT_VERSION` for format versions < 2, while this path uses `NOT_SUPPORTED`, and the messages differ in whether they mention update mode. Please align the exception type and message with the other row-level operations (merge/update/delete) for consistent, predictable behavior and easier programmatic handling.
Suggested implementation:
```java
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(
ICEBERG_INVALID_FORMAT_VERSION,
format(
"Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
MIN_FORMAT_VERSION_FOR_DELETE,
MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
formatVersion));
}
if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
throw new PrestoException(
ICEBERG_INVALID_FORMAT_VERSION,
format(
"Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
MIN_FORMAT_VERSION_FOR_DELETE,
MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
formatVersion));
}
```
To fully align messages and error codes across all row-level operations (merge/update/delete), you should:
1. Locate the corresponding format-version checks in `beginMerge`, `beginDelete`, and any other row-level operation helpers in `IcebergAbstractMetadata` (or related classes).
2. Update their `PrestoException` construction to use:
- The same `ICEBERG_INVALID_FORMAT_VERSION` error code.
- The same message template:
`"Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s"`.
- The same arguments: `MIN_FORMAT_VERSION_FOR_DELETE`, `MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS`, `formatVersion`.
This will ensure consistent, predictable behavior and make programmatic handling of unsupported format versions easier across all operations.
</issue_to_address>
### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java:71-72` </location>
<code_context>
+ assertQuerySucceeds("DROP TABLE IF EXISTS " + tableName);
+ }
+
+ @Test
+ public void testCreateV3Table()
+ {
+ String tableName = "test_create_v3_table";
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test that verifies v3 column default values are rejected
The new `validateTableForPresto` logic that rejects v3 fields with `initialDefault` or `writeDefault` isn’t covered by tests. Please add a test that creates a v3 table, evolves the schema via `Table.updateSchema()` to add an `initialDefault`/`writeDefault`, commits it, and then verifies that both `SELECT` and `INSERT` in Presto fail with the expected NOT_SUPPORTED error, confirming the guardrail works end-to-end.
Suggested implementation:
```java
@Test
public void testCreateV3Table()
{
String tableName = "test_create_v3_table";
try {
assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
Table table = loadTable(tableName);
assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
}
finally {
dropTable(tableName);
}
}
@Test
public void testV3ColumnDefaultValuesRejected()
{
String tableName = "test_v3_column_default_values_rejected";
try {
// Create a v3 table
assertUpdate("CREATE TABLE " + tableName + " (id integer) WITH (\"format-version\" = '3')");
Table table = loadTable(tableName);
assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
// Evolve the schema to add a column with a v3 default value (initialDefault/writeDefault)
// so that validateTableForPresto should reject it.
table.updateSchema()
.addColumn("v", Types.IntegerType.get(), "value with default", 42)
.commit();
// Both SELECT and INSERT should fail with the NOT_SUPPORTED error due to v3 defaults.
assertQueryFails("SELECT * FROM " + tableName, ".*NOT_SUPPORTED.*");
assertQueryFails("INSERT INTO " + tableName + " (id) VALUES (1)", ".*NOT_SUPPORTED.*");
}
finally {
dropTable(tableName);
}
}
```
Depending on the existing imports and Iceberg version in this module, you may need to:
1. Ensure `org.apache.iceberg.types.Types` is imported at the top of `TestIcebergV3.java`:
- Add `import org.apache.iceberg.types.Types;` if it is not already present.
2. Adjust the `updateSchema()` call if your Iceberg version exposes a different API for setting v3 default values:
- For example, if the signature is `addColumn(String name, Type type)` plus a fluent method like `.withInitialDefault(...)` or `.withDefault(...)`, replace the `addColumn("v", Types.IntegerType.get(), "value with default", 42)` chain with the appropriate sequence that results in a column having `initialDefault` and/or `writeDefault` set in the schema.
3. If your test base class uses a different helper to assert error codes (e.g., `assertQueryFailsWithErrorCodeMatching`), you can tighten the assertions to check specifically for the `NOT_SUPPORTED` error code instead of matching `"NOT_SUPPORTED"` in the failure message.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| catch (RuntimeException e) { | ||
| // If the snapshot cannot be retrieved (e.g. metadata is missing), we cannot validate the table. |
There was a problem hiding this comment.
issue (bug_risk): Catching RuntimeException broadly here may hide unexpected failures during snapshot retrieval.
Swallowing all RuntimeExceptions here will also hide programming/configuration errors that should surface or at least be observable. Consider catching a narrower, Iceberg-specific exception for the expected failure mode, or at minimum log the exception before returning so operational debugging remains possible.
| if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) { | ||
| throw new PrestoException(ICEBERG_INVALID_FORMAT_VERSION, | ||
| "Iceberg table updates require at least format version 2 and update mode must be merge-on-read"); | ||
| } | ||
| if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) { | ||
| throw new PrestoException(NOT_SUPPORTED, | ||
| format("Iceberg table updates for format version %s are not supported yet", formatVersion)); | ||
| } | ||
| if (!Optional.ofNullable(icebergTable.properties().get(TableProperties.UPDATE_MODE)) | ||
| .map(mode -> mode.equals(MERGE_ON_READ.modeName())) |
There was a problem hiding this comment.
suggestion: Error codes/messages for unsupported format versions are inconsistent between beginMerge and beginUpdate.
beginMerge uses ICEBERG_INVALID_FORMAT_VERSION for format versions < 2, while this path uses NOT_SUPPORTED, and the messages differ in whether they mention update mode. Please align the exception type and message with the other row-level operations (merge/update/delete) for consistent, predictable behavior and easier programmatic handling.
Suggested implementation:
if (formatVersion < MIN_FORMAT_VERSION_FOR_DELETE) {
throw new PrestoException(
ICEBERG_INVALID_FORMAT_VERSION,
format(
"Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
MIN_FORMAT_VERSION_FOR_DELETE,
MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
formatVersion));
}
if (formatVersion > MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS) {
throw new PrestoException(
ICEBERG_INVALID_FORMAT_VERSION,
format(
"Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s",
MIN_FORMAT_VERSION_FOR_DELETE,
MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,
formatVersion));
}To fully align messages and error codes across all row-level operations (merge/update/delete), you should:
- Locate the corresponding format-version checks in
beginMerge,beginDelete, and any other row-level operation helpers inIcebergAbstractMetadata(or related classes). - Update their
PrestoExceptionconstruction to use:- The same
ICEBERG_INVALID_FORMAT_VERSIONerror code. - The same message template:
"Iceberg row-level operations are only supported for table format versions in range [%s, %s]. Found: %s". - The same arguments:
MIN_FORMAT_VERSION_FOR_DELETE,MAX_FORMAT_VERSION_FOR_ROW_LEVEL_OPERATIONS,formatVersion.
This will ensure consistent, predictable behavior and make programmatic handling of unsupported format versions easier across all operations.
- The same
| @Test | ||
| public void testCreateV3Table() |
There was a problem hiding this comment.
suggestion (testing): Add a test that verifies v3 column default values are rejected
The new validateTableForPresto logic that rejects v3 fields with initialDefault or writeDefault isn’t covered by tests. Please add a test that creates a v3 table, evolves the schema via Table.updateSchema() to add an initialDefault/writeDefault, commits it, and then verifies that both SELECT and INSERT in Presto fail with the expected NOT_SUPPORTED error, confirming the guardrail works end-to-end.
Suggested implementation:
@Test
public void testCreateV3Table()
{
String tableName = "test_create_v3_table";
try {
assertUpdate("CREATE TABLE " + tableName + " (id integer, value varchar) WITH (\"format-version\" = '3')");
Table table = loadTable(tableName);
assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
assertQuery("SELECT * FROM " + tableName, "SELECT * WHERE false");
}
finally {
dropTable(tableName);
}
}
@Test
public void testV3ColumnDefaultValuesRejected()
{
String tableName = "test_v3_column_default_values_rejected";
try {
// Create a v3 table
assertUpdate("CREATE TABLE " + tableName + " (id integer) WITH (\"format-version\" = '3')");
Table table = loadTable(tableName);
assertEquals(((BaseTable) table).operations().current().formatVersion(), 3);
// Evolve the schema to add a column with a v3 default value (initialDefault/writeDefault)
// so that validateTableForPresto should reject it.
table.updateSchema()
.addColumn("v", Types.IntegerType.get(), "value with default", 42)
.commit();
// Both SELECT and INSERT should fail with the NOT_SUPPORTED error due to v3 defaults.
assertQueryFails("SELECT * FROM " + tableName, ".*NOT_SUPPORTED.*");
assertQueryFails("INSERT INTO " + tableName + " (id) VALUES (1)", ".*NOT_SUPPORTED.*");
}
finally {
dropTable(tableName);
}
}Depending on the existing imports and Iceberg version in this module, you may need to:
- Ensure
org.apache.iceberg.types.Typesis imported at the top ofTestIcebergV3.java:- Add
import org.apache.iceberg.types.Types;if it is not already present.
- Add
- Adjust the
updateSchema()call if your Iceberg version exposes a different API for setting v3 default values:- For example, if the signature is
addColumn(String name, Type type)plus a fluent method like.withInitialDefault(...)or.withDefault(...), replace theaddColumn("v", Types.IntegerType.get(), "value with default", 42)chain with the appropriate sequence that results in a column havinginitialDefaultand/orwriteDefaultset in the schema.
- For example, if the signature is
- If your test base class uses a different helper to assert error codes (e.g.,
assertQueryFailsWithErrorCodeMatching), you can tighten the assertions to check specifically for theNOT_SUPPORTEDerror code instead of matching"NOT_SUPPORTED"in the failure message.
Description
Add initial support for Iceberg table format version 3 while constraining unsupported features and row-level operations to safe, explicitly validated paths.
New Features:
Enhancements:
Test Plan
Add TestIcebergV3 integration test suite covering creation, upgrade, insert, query, and partitioning for v3 tables, as well as rejection of unsupported delete, update, merge, and OPTIMIZE operations on v3 tables.
Reference PR - trinodb/trino#27786
Co-authored-by: Dain Sundstrom dain@iq80.com
Release Notes
Summary by Sourcery
Add guarded initial support for Iceberg table format version 3 while constraining unsupported features and row-level operations to fail fast with clear errors.
New Features:
Enhancements:
Tests: