feat: [presto][iceberg] Add Iceberg V3 deletion vector write path with DV page sink and compaction procedure (#27395)#27395
Conversation
Reviewer's GuideImplements Iceberg V3 deletion vector (DV) write and compaction support across Presto Java and native (Velox) stacks: adds a Puffin-based DV page sink and DV-aware metadata/commit path, wires DV and dataSequenceNumber through split/protocol handling, enables V3 row-level operations and default values, introduces a DV compaction procedure, and expands V3 integration tests around DVs, schema evolution, and partition transforms. Sequence diagram for Iceberg V3 deletion vector write pathsequenceDiagram
actor User
participant Coordinator as PrestoCoordinator
participant PageSourceProvider as IcebergPageSourceProvider
participant UpdateableSource as IcebergUpdateablePageSource
participant DVPageSink as IcebergDeletionVectorPageSink
participant Metadata as IcebergAbstractMetadata
participant Iceberg as IcebergTable
User->>Coordinator: Execute DELETE on Iceberg v3 table
Coordinator->>PageSourceProvider: createPageSource(tableFormatVersion=3)
PageSourceProvider->>PageSourceProvider: detect format-version >= 3
PageSourceProvider->>PageSourceProvider: create deleteSinkSupplier() -> IcebergDeletionVectorPageSink
PageSourceProvider-->>Coordinator: IcebergUpdateablePageSource with deleteSinkSupplier
loop during delete execution
Coordinator->>UpdateableSource: updateRows/deleteRows
UpdateableSource->>DVPageSink: appendPage(Page rowIds)
DVPageSink->>DVPageSink: collect row positions
end
Coordinator->>UpdateableSource: finish()
UpdateableSource->>DVPageSink: finish()
DVPageSink->>DVPageSink: sort positions and serialize Roaring bitmap
DVPageSink->>DVPageSink: write Puffin DV file via PuffinWriter
DVPageSink-->>UpdateableSource: CommitTaskData JSON (FileFormat=PUFFIN, content=POSITION_DELETES, contentOffset, contentSizeInBytes, recordCount, referencedDataFile)
UpdateableSource-->>Coordinator: fragments containing CommitTaskData
Coordinator->>Metadata: finishDeleteWithOutput(fragments)
Metadata->>Metadata: decode CommitTaskData
Metadata->>Metadata: if FileFormat=PUFFIN
Metadata->>Metadata: set recordCount, contentOffset, contentSizeInBytes
Metadata->>Iceberg: FileMetadata.deleteFileBuilder()
Iceberg-->>Metadata: DeleteFile (format=PUFFIN, content=POSITION_DELETES with DV metadata)
Metadata->>Iceberg: commit delete file to table snapshot
Iceberg-->>Coordinator: commit success
Coordinator-->>User: DELETE completed
Class diagram for Iceberg V3 deletion vector page sink and commit metadataclassDiagram
class CommitTaskData {
+String path
+long fileSizeInBytes
+MetricsWrapper metrics
+int partitionSpecId
+Optional~String~ partitionDataJson
+FileFormat fileFormat
+Optional~String~ referencedDataFile
+FileContent content
+OptionalLong contentOffset
+OptionalLong contentSizeInBytes
+OptionalLong recordCount
+CommitTaskData(path, fileSizeInBytes, metrics, partitionSpecId, partitionDataJson, fileFormat, referencedDataFile, content, contentOffset, contentSizeInBytes, recordCount)
+CommitTaskData(path, fileSizeInBytes, metrics, partitionSpecId, partitionDataJson, fileFormat, referencedDataFile, content)
+String getPath()
+long getFileSizeInBytes()
+MetricsWrapper getMetrics()
+int getPartitionSpecId()
+Optional~String~ getPartitionDataJson()
+FileFormat getFileFormat()
+Optional~String~ getReferencedDataFile()
+FileContent getContent()
+OptionalLong getContentOffset()
+OptionalLong getContentSizeInBytes()
+OptionalLong getRecordCount()
}
class IcebergDeletionVectorPageSink {
-PartitionSpec partitionSpec
-Optional~PartitionData~ partitionData
-LocationProvider locationProvider
-HdfsEnvironment hdfsEnvironment
-HdfsContext hdfsContext
-JsonCodec~CommitTaskData~ jsonCodec
-ConnectorSession session
-String dataFile
-List~Integer~ collectedPositions
+IcebergDeletionVectorPageSink(partitionSpec, partitionDataAsJson, locationProvider, hdfsEnvironment, hdfsContext, jsonCodec, session, dataFile)
+long getCompletedBytes()
+long getSystemMemoryUsage()
+long getValidationCpuNanos()
+CompletableFuture appendPage(Page page)
+CompletableFuture~Collection~finish()
+void abort()
-static byte[] serializeRoaringBitmap(List~Integer~ sortedPositions)
}
class RewriteDeleteFilesProcedure {
-IcebergMetadataFactory metadataFactory
+RewriteDeleteFilesProcedure(IcebergMetadataFactory metadataFactory)
+Procedure get()
+void rewriteDeleteFiles(ConnectorSession clientSession, String schemaName, String tableName)
-void readDeletionVectorPositions(Table table, DeleteFile dv, Set~Integer~ positions)
-DeleteFile writeMergedDeletionVector(Table table, DeleteFile templateDv, String dataFilePath, Set~Integer~ mergedPositions)
-static void deserializeRoaringBitmap(ByteBuffer buffer, Set~Integer~ positions)
-static byte[] serializeRoaringBitmap(List~Integer~ sortedPositions)
}
class DeleteFile {
+FileContent content
+String path
+FileFormat format
+long recordCount
+long fileSizeInBytes
+List~Integer~ equalityFieldIds
+Map~Integer, byte[]~ lowerBounds
+Map~Integer, byte[]~ upperBounds
+long dataSequenceNumber
+static DeleteFile fromIceberg(org.apache.iceberg.DeleteFile deleteFile)
+DeleteFile(content, path, format, recordCount, fileSizeInBytes, equalityFieldIds, lowerBounds, upperBounds, dataSequenceNumber)
+FileContent getContent()
+String getPath()
+FileFormat getFormat()
+long getRecordCount()
+long getFileSizeInBytes()
+List~Integer~ getEqualityFieldIds()
+Map~Integer, byte[]~ getLowerBounds()
+Map~Integer, byte[]~ getUpperBounds()
+long getDataSequenceNumber()
+String toString()
}
class IcebergAbstractMetadata {
+Optional~ConnectorOutputMetadata~ finishDeleteWithOutput(ConnectorSession session, ConnectorTableHandle tableHandle, List~ColumnHandle~ columns, List~Slice~ fragments)
}
class IcebergPageSourceProvider {
+ConnectorPageSource createPageSource(ConnectorTransactionHandle transaction, ConnectorSession session, ConnectorSplit split, ConnectorTableHandle table, List~ColumnHandle~ columns, SplitContext splitContext)
}
class IcebergUpdateablePageSource {
-Supplier~ConnectorPageSink~ deleteSinkSupplier
-ConnectorPageSink positionDeleteSink
+IcebergUpdateablePageSource(ConnectorPageSource delegate, List~IcebergColumnHandle~ delegateColumns, Supplier~ConnectorPageSink~ deleteSinkSupplier, Supplier~Optional~RowPredicate~~ deletePredicate, Supplier~List~DeleteFilter~~ deleteFilters, Supplier~IcebergPageSink~ updatedRowPageSinkSupplier, ConnectorSession session)
+CompletableFuture~Collection~finish()
}
IcebergDeletionVectorPageSink ..> CommitTaskData : creates
IcebergDeletionVectorPageSink ..> PartitionSpec
IcebergDeletionVectorPageSink ..> PartitionData
IcebergDeletionVectorPageSink ..|> ConnectorPageSink
IcebergUpdateablePageSource ..> ConnectorPageSink : deleteSinkSupplier
IcebergUpdateablePageSource o--> IcebergDeletionVectorPageSink : for formatVersion>=3
IcebergPageSourceProvider ..> IcebergDeletionVectorPageSink : constructs for v3 tables
IcebergPageSourceProvider ..> IcebergUpdateablePageSource
IcebergAbstractMetadata ..> CommitTaskData : consumes
IcebergAbstractMetadata ..> DeleteFile : builds Iceberg delete files
RewriteDeleteFilesProcedure ..|> Provider~Procedure~
RewriteDeleteFilesProcedure ..> IcebergMetadataFactory
RewriteDeleteFilesProcedure ..> IcebergAbstractMetadata
RewriteDeleteFilesProcedure ..> DeleteFile
DeleteFile ..> FileContent
DeleteFile ..> FileFormat
CommitTaskData ..> FileFormat
CommitTaskData ..> FileContent
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The deletion vector roaring bitmap encoders/decoders (
IcebergDeletionVectorPageSink.serializeRoaringBitmap,RewriteDeleteFilesProcedure.serializeRoaringBitmap/deserializeRoaringBitmap, and the test helper) are hand-rolled and in some cases assume a single container (positions < 2^16) and castlongtointwithout bounds checks; consider either leveraging an existing roaring implementation (e.g. Iceberg/roaringbitmap utilities) or at least validating position ranges and supporting multi-container bitmaps to avoid subtle corruption for larger row positions. - Both the DV page sink and the rewrite_delete_files compaction procedure write Puffin blobs with hardcoded snapshotId/sequenceNumber values of 0; if downstream readers or metadata tooling rely on these fields for DV discovery or conflict resolution, it would be safer to plumb through the actual snapshot/sequence information (or explicitly document and enforce that they are unused).
- In
RewriteDeleteFilesProcedure.rewriteDeleteFiles, you buildfilesToRemove/filesToAddby iteratingplanFiles()and then callicebergTable.newRewrite().rewriteFiles(...).commit(); you never remove any data files here, only delete files—consider using the more specificrewriteFiles(rewrittenDeleteFiles, newDeleteFiles)overload (or adding a comment) so it’s clear that this operation is only compacting delete files and not touching data files.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The deletion vector roaring bitmap encoders/decoders (`IcebergDeletionVectorPageSink.serializeRoaringBitmap`, `RewriteDeleteFilesProcedure.serializeRoaringBitmap`/`deserializeRoaringBitmap`, and the test helper) are hand-rolled and in some cases assume a single container (positions < 2^16) and cast `long` to `int` without bounds checks; consider either leveraging an existing roaring implementation (e.g. Iceberg/roaringbitmap utilities) or at least validating position ranges and supporting multi-container bitmaps to avoid subtle corruption for larger row positions.
- Both the DV page sink and the rewrite_delete_files compaction procedure write Puffin blobs with hardcoded snapshotId/sequenceNumber values of 0; if downstream readers or metadata tooling rely on these fields for DV discovery or conflict resolution, it would be safer to plumb through the actual snapshot/sequence information (or explicitly document and enforce that they are unused).
- In `RewriteDeleteFilesProcedure.rewriteDeleteFiles`, you build `filesToRemove`/`filesToAdd` by iterating `planFiles()` and then call `icebergTable.newRewrite().rewriteFiles(...).commit()`; you never remove any data files here, only delete files—consider using the more specific `rewriteFiles(rewrittenDeleteFiles, newDeleteFiles)` overload (or adding a comment) so it’s clear that this operation is only compacting delete files and not touching data files.
## Individual Comments
### Comment 1
<location path="presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java" line_range="150-154" />
<code_context>
}
@Test
- public void testDeleteOnV3TableNotSupported()
+ public void testDeleteOnV3Table()
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for rewrite_delete_files on a V3 table that does not require compaction (0 or 1 DV per data file).
This currently covers the compaction case (multiple DVs per data file). Please also add a test where a V3 table has no DVs or exactly one DV per data file so that `CALL system.rewrite_delete_files` is a no-op, to confirm we never drop the only DV on a file and that the metadata commit path is safe when there’s nothing to rewrite.
Suggested implementation:
```java
}
@Test
public void testRewriteDeleteFilesNoOpOnV3Table()
{
String tableName = "test_v3_rewrite_delete_noop";
try {
assertUpdate("CREATE TABLE " + tableName + " (id BIGINT, name VARCHAR, salary DOUBLE) " +
"WITH (format = 'PARQUET', format_version = 3)", 0);
assertUpdate("INSERT INTO " + tableName + " VALUES " +
"(1, 'Alice', 100.0), (2, 'Bob', 200.0), (3, 'Charlie', 300.0)", 3);
assertQuery("SELECT * FROM " + tableName + " ORDER BY id",
"VALUES (1, 'Alice', 100.0), (2, 'Bob', 200.0), (3, 'Charlie', 300.0)");
// Issue deletes expected to create at most one delete vector per data file
// and ensure the remaining visible row is correct.
assertUpdate("DELETE FROM " + tableName + " WHERE id IN (1, 3)", 2);
assertQuery("SELECT * FROM " + tableName,
"VALUES (2, 'Bob', 200.0)");
// rewrite_delete_files should be a no-op: data must remain unchanged
// and deleted rows must stay deleted.
assertUpdate("CALL system.rewrite_delete_files(table => '" + tableName + "')");
assertQuery("SELECT * FROM " + tableName,
"VALUES (2, 'Bob', 200.0)");
}
finally {
assertUpdate("DROP TABLE IF EXISTS " + tableName);
}
}
@Test
public void testDeleteOnV3Table()
```
1. Align the `CALL system.rewrite_delete_files` invocation with the existing tests in this file:
- If other tests pass catalog and schema separately (e.g., `CALL system.rewrite_delete_files('iceberg', 'tpch', '<table>')` or similar), update the call in `testRewriteDeleteFilesNoOpOnV3Table` accordingly.
- If tests use fully-qualified table names including schema (e.g., `"CALL system.rewrite_delete_files(table => 'tpch." + tableName + "')"`), mirror that pattern instead of the bare `tableName` used above.
2. If the V3 table creation in other tests uses a connector-specific `WITH` clause (e.g., `format_version = 3, partitioning`, or different format), adjust the `CREATE TABLE` statement to match those conventions so that the DV layout matches production expectations.
3. If there is a helper to generate unique table names or to qualify them with a schema/catalog, use it here instead of the raw `tableName` string.
</issue_to_address>
### Comment 2
<location path="presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java" line_range="629-638" />
<code_context>
+import static com.facebook.presto.iceberg.IcebergUtil.getIcebergTable;
+import static java.util.Objects.requireNonNull;
+
+/**
+ * Procedure to compact deletion vectors (DVs) on V3 Iceberg tables.
+ *
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a small self-checking test around serializeRoaringBitmapNoRun to guard against encoding regressions.
This helper encodes a specific RoaringBitmap "no-run" portable format expected by Velox and is currently only exercised indirectly. Please add a focused unit test that builds a small bitmap, calls this helper, and validates the cookie/structure (or round-trips through an Iceberg/Velox reader) so encoding changes are caught early and the constraints (e.g., positions < 65536) are documented in executable form.
</issue_to_address>
### Comment 3
<location path="presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergV3.java" line_range="357-362" />
<code_context>
+ try {
+ computeActual("SELECT * FROM " + tableName);
+ }
+ catch (RuntimeException e) {
+ // Verify the error is NOT the old "PUFFIN not supported" rejection.
+ // Other failures (e.g., fake .puffin file not on disk) are acceptable.
+ assertFalse(
+ e.getMessage().contains("Iceberg deletion vectors") && e.getMessage().contains("not supported"),
+ "PUFFIN deletion vectors should be accepted, not rejected: " + e.getMessage());
+ }
}
</code_context>
<issue_to_address>
**nitpick:** Defensively handle a null exception message in PUFFIN DV acceptance tests.
Here and in similar catch blocks, `e.getMessage()` is dereferenced directly in `contains` checks and the assertion message. If the message is null, the test will throw a `NullPointerException` instead of reporting the original failure. Using `String.valueOf(e.getMessage())` (for both the checks and the message) or adding an explicit null guard would make these tests more robust.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| public void testDeleteOnV3TableNotSupported() | ||
| public void testDeleteOnV3Table() | ||
| { | ||
| String tableName = "test_v3_delete"; | ||
| try { |
There was a problem hiding this comment.
suggestion (testing): Add coverage for rewrite_delete_files on a V3 table that does not require compaction (0 or 1 DV per data file).
This currently covers the compaction case (multiple DVs per data file). Please also add a test where a V3 table has no DVs or exactly one DV per data file so that CALL system.rewrite_delete_files is a no-op, to confirm we never drop the only DV on a file and that the metadata commit path is safe when there’s nothing to rewrite.
Suggested implementation:
}
@Test
public void testRewriteDeleteFilesNoOpOnV3Table()
{
String tableName = "test_v3_rewrite_delete_noop";
try {
assertUpdate("CREATE TABLE " + tableName + " (id BIGINT, name VARCHAR, salary DOUBLE) " +
"WITH (format = 'PARQUET', format_version = 3)", 0);
assertUpdate("INSERT INTO " + tableName + " VALUES " +
"(1, 'Alice', 100.0), (2, 'Bob', 200.0), (3, 'Charlie', 300.0)", 3);
assertQuery("SELECT * FROM " + tableName + " ORDER BY id",
"VALUES (1, 'Alice', 100.0), (2, 'Bob', 200.0), (3, 'Charlie', 300.0)");
// Issue deletes expected to create at most one delete vector per data file
// and ensure the remaining visible row is correct.
assertUpdate("DELETE FROM " + tableName + " WHERE id IN (1, 3)", 2);
assertQuery("SELECT * FROM " + tableName,
"VALUES (2, 'Bob', 200.0)");
// rewrite_delete_files should be a no-op: data must remain unchanged
// and deleted rows must stay deleted.
assertUpdate("CALL system.rewrite_delete_files(table => '" + tableName + "')");
assertQuery("SELECT * FROM " + tableName,
"VALUES (2, 'Bob', 200.0)");
}
finally {
assertUpdate("DROP TABLE IF EXISTS " + tableName);
}
}
@Test
public void testDeleteOnV3Table()- Align the
CALL system.rewrite_delete_filesinvocation with the existing tests in this file:- If other tests pass catalog and schema separately (e.g.,
CALL system.rewrite_delete_files('iceberg', 'tpch', '<table>')or similar), update the call intestRewriteDeleteFilesNoOpOnV3Tableaccordingly. - If tests use fully-qualified table names including schema (e.g.,
"CALL system.rewrite_delete_files(table => 'tpch." + tableName + "')"), mirror that pattern instead of the baretableNameused above.
- If other tests pass catalog and schema separately (e.g.,
- If the V3 table creation in other tests uses a connector-specific
WITHclause (e.g.,format_version = 3, partitioning, or different format), adjust theCREATE TABLEstatement to match those conventions so that the DV layout matches production expectations. - If there is a helper to generate unique table names or to qualify them with a schema/catalog, use it here instead of the raw
tableNamestring.
| /** | ||
| * Serializes a roaring bitmap in the portable "no-run" format (cookie = 12346). | ||
| * This produces the exact binary format expected by Velox's DeletionVectorReader. | ||
| * Only supports positions within a single container (all < 65536). | ||
| */ | ||
| private static byte[] serializeRoaringBitmapNoRun(int[] positions) | ||
| { | ||
| // Cookie (12346) + numContainers (1) | ||
| // + 1 container key-cardinality pair (4 bytes) | ||
| // + sorted uint16 values (2 bytes each) |
There was a problem hiding this comment.
suggestion (testing): Consider adding a small self-checking test around serializeRoaringBitmapNoRun to guard against encoding regressions.
This helper encodes a specific RoaringBitmap "no-run" portable format expected by Velox and is currently only exercised indirectly. Please add a focused unit test that builds a small bitmap, calls this helper, and validates the cookie/structure (or round-trips through an Iceberg/Velox reader) so encoding changes are caught early and the constraints (e.g., positions < 65536) are documented in executable form.
| catch (RuntimeException e) { | ||
| // Verify the error is NOT the old "PUFFIN not supported" rejection. | ||
| // Other failures (e.g., fake .puffin file not on disk) are acceptable. | ||
| assertFalse( | ||
| e.getMessage().contains("Iceberg deletion vectors") && e.getMessage().contains("not supported"), | ||
| "PUFFIN deletion vectors should be accepted, not rejected: " + e.getMessage()); |
There was a problem hiding this comment.
nitpick: Defensively handle a null exception message in PUFFIN DV acceptance tests.
Here and in similar catch blocks, e.getMessage() is dereferenced directly in contains checks and the assertion message. If the message is null, the test will throw a NullPointerException instead of reporting the original failure. Using String.valueOf(e.getMessage()) (for both the checks and the message) or adding an explicit null guard would make these tests more robust.
…h DV page sink and compaction procedure (prestodb#27395) Summary: - Add IcebergDeletionVectorPageSink for writing DV files during table maintenance - Add RewriteDeleteFilesProcedure for DV compaction - Wire DV page sink through IcebergCommonModule, IcebergAbstractMetadata, IcebergPageSourceProvider - Add IcebergUpdateablePageSource for DV-aware page source - Update CommitTaskData, IcebergUtil for DV support - Add test coverage in TestIcebergV3 == RELEASE NOTES == General Changes * Upgrade Apache Iceberg library from 1.10.0 to 1.10.1. Hive Connector Changes * Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure. * Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips. * Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types. * Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation. * Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance. * Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables. * Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables. * Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution. * Improve IcebergSplitReader error handling and fix test file handle leaks. * Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries. Differential Revision: D97531549
9d4ca39 to
fe96c90
Compare
…h DV page sink and compaction procedure (prestodb#27395) Summary: - Add IcebergDeletionVectorPageSink for writing DV files during table maintenance - Add RewriteDeleteFilesProcedure for DV compaction - Wire DV page sink through IcebergCommonModule, IcebergAbstractMetadata, IcebergPageSourceProvider - Add IcebergUpdateablePageSource for DV-aware page source - Update CommitTaskData, IcebergUtil for DV support - Add test coverage in TestIcebergV3 == RELEASE NOTES == General Changes * Upgrade Apache Iceberg library from 1.10.0 to 1.10.1. Hive Connector Changes * Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure. * Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips. * Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types. * Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation. * Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance. * Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables. * Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables. * Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution. * Improve IcebergSplitReader error handling and fix test file handle leaks. * Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries. Differential Revision: D97531549
fe96c90 to
cd5b2e2
Compare
…or extensibility (prestodb#27391) Summary: - Reformat FileContent enum in presto_protocol_iceberg.h from single-line to multi-line for better readability and future extension. - Add blank line for visual separation before infoColumns initialization. Protocol files are auto-generated from Java sources via chevron. The manual edits here mirror what the generator would produce once the Java changes are landed and the protocol is regenerated. == RELEASE NOTES == General Changes * Upgrade Apache Iceberg library from 1.10.0 to 1.10.1. Hive Connector Changes * Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure. * Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips. * Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types. * Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation. * Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance. * Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables. * Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables. * Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution. * Improve IcebergSplitReader error handling and fix test file handle leaks. * Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries. Differential Revision: D97531548
… for equality delete conflict resolution (prestodb#27392) Summary: Wire the dataSequenceNumber field from the Java Presto protocol to the C++ Velox connector layer, enabling server-side sequence number conflict resolution for equality delete files. Changes: - Add dataSequenceNumber field to IcebergSplit protocol (Java + C++) - Parse dataSequenceNumber in IcebergPrestoToVeloxConnector and pass it through HiveIcebergSplit to IcebergSplitReader - Add const qualifiers to local variables for code clarity == RELEASE NOTES == General Changes * Upgrade Apache Iceberg library from 1.10.0 to 1.10.1. Hive Connector Changes * Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure. * Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips. * Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types. * Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation. * Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance. * Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables. * Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables. * Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution. * Improve IcebergSplitReader error handling and fix test file handle leaks. * Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries. Differential Revision: D97531547
…ector discovery (prestodb#27393) Summary: Iceberg V3 introduces deletion vectors stored as blobs inside Puffin files. Previously, the coordinator's IcebergSplitSource rejected PUFFIN-format delete files with a NOT_SUPPORTED error, preventing V3 deletion vectors from being discovered and sent to workers. This diff: 1. Adds PUFFIN to the FileFormat enum (both presto-trunk and presto-facebook-trunk) so fromIcebergFileFormat() can convert Iceberg's PUFFIN format to Presto's FileFormat.PUFFIN. 2. Removes the PUFFIN rejection check in presto-trunk's IcebergSplitSource.toIcebergSplit(), allowing deletion vector files to flow through to workers. 3. Updates TestIcebergV3 to verify PUFFIN files are accepted rather than rejected at split enumeration time. The C++ worker-side changes (protocol enum + connector conversion) will follow in a separate diff. Differential Revision: D97531557
…ocol and connector layer (prestodb#27394) Summary: This is the C++ counterpart to the Java PUFFIN support diff. It wires the PUFFIN file format through the Prestissimo protocol and connector conversion layer so that Iceberg V3 deletion vector files can be deserialized and handled by native workers. Changes: 1. Adds PUFFIN to the C++ protocol FileFormat enum and its JSON serialization table in presto_protocol_iceberg.{h,cpp}. 2. Handles PUFFIN in toVeloxFileFormat() in IcebergPrestoToVeloxConnector.cpp, mapping it to DWRF as a placeholder since DeletionVectorReader reads raw binary and does not use the DWRF/Parquet reader infrastructure. == RELEASE NOTES == General Changes * Upgrade Apache Iceberg library from 1.10.0 to 1.10.1. Hive Connector Changes * Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure. * Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips. * Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types. * Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation. * Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance. * Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables. * Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables. * Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution. * Improve IcebergSplitReader error handling and fix test file handle leaks. * Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries. Differential Revision: D97531555
…h DV page sink and compaction procedure (prestodb#27395) Summary: - Add IcebergDeletionVectorPageSink for writing DV files during table maintenance - Add RewriteDeleteFilesProcedure for DV compaction - Wire DV page sink through IcebergCommonModule, IcebergAbstractMetadata, IcebergPageSourceProvider - Add IcebergUpdateablePageSource for DV-aware page source - Update CommitTaskData, IcebergUtil for DV support - Add test coverage in TestIcebergV3 == RELEASE NOTES == General Changes * Upgrade Apache Iceberg library from 1.10.0 to 1.10.1. Hive Connector Changes * Add Iceberg V3 deletion vector (DV) support using Puffin-encoded roaring�bitmaps, including a DV reader, writer, page sink, and compaction procedure. * Add Iceberg equality delete file reader with sequence number conflict�resolution per the Iceberg V2+ spec: equality deletes skip when�deleteFileSeqNum <= dataFileSeqNum; positional deletes and DVs skip when�deleteFileSeqNum < dataFileSeqNum; sequence number 0 (V1 legacy) never skips. * Wire dataSequenceNumber through the Presto protocol layer (Java → C++)�to enable server-side sequence number conflict resolution for all delete�file types. * Add PUFFIN file format support for deletion vector discovery, enabling�the coordinator to locate DV files during split creation. * Add Iceberg V3 deletion vector write path with DV page sink and�rewrite_delete_files compaction procedure for DV maintenance. * Add nanosecond timestamp (TIMESTAMP_NANO) type support for Iceberg V3�tables. * Add Variant type support for Iceberg V3, enabling semi-structured data�columns in Iceberg tables. * Eagerly collect delete files during split creation with improved logging�for easier debugging of Iceberg delete file resolution. * Improve IcebergSplitReader error handling and fix test file handle leaks. * Add end-to-end integration tests for Iceberg V3 covering snapshot�lifecycle (INSERT, DELETE with equality/positional/DV deletes, UPDATE,�MERGE, time-travel) and all 99 TPC-DS queries. Differential Revision: D97531549
cd5b2e2 to
0216dc0
Compare
Summary:
== RELEASE NOTES ==
General Changes
Hive Connector Changes
Differential Revision: D97531549