feat: Support Iceberg's single-table multi-statement writes transaction#25003
feat: Support Iceberg's single-table multi-statement writes transaction#25003hantangwangd wants to merge 9 commits intoprestodb:masterfrom
Conversation
|
I am a bit confused about the intent of this PR. Could you help to provide some examples on a scenario in Iceberg where it is helpful to add this flag? I note that the I also don't see where the |
|
@tdcmeehan, thanks for the review and comment. I think the core purpose of this PR is to support connectors that allow multi-write transaction for a subset of SQL statements. Currently, for many connectors (like Iceberg), multi-write statement transaction support is typically limited to some certain DML operations, with little to no support for DDL statements. It's not an all-or-nothing proposition, so a connector-level Take Iceberg for example, with this
This PR do not really open the multi-write statement transaction for Iceberg. After this is merged, I plan to support the actual multi-statements write transaction for Iceberg in one or several follow-up PRs. What's your opinion? |
|
@hantangwangd can you include the Iceberg changes into this PR as a separate commit? |
Sure, I'll add the change later. My thought is, in the beginning, we can allow DQL and some certain DML to run together within a transaction with |
10386b3 to
c440544
Compare
55028af to
a4fc379
Compare
|
Hi @tdcmeehan, I've added the code changes to support multi-statement writes transactions for Iceberg, and added the relevant design choices to the PR description. Please take a look when convenient. Any feedback would be greatly appreciated! |
61383e0 to
574b3ac
Compare
f79248e to
d7e430c
Compare
|
Please resolve the file conflicts to move this PR closer to merging. |
4fea82c to
b889d72
Compare
|
@steveburnett thank you for the reminder. Done! Please take a look when you get a chance. |
tdcmeehan
left a comment
There was a problem hiding this comment.
Looks really great. I only have some nits.
...iceberg/src/main/java/com/facebook/presto/iceberg/transaction/IcebergTransactionContext.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
| private final boolean autoCommitContext; | ||
| private final Map<SchemaTableName, Transaction> txByTable; | ||
| private final Map<SchemaTableName, Table> initiallyReadTables; | ||
| private Optional<Runnable> callbacksOnCommit = Optional.empty(); |
There was a problem hiding this comment.
| private Optional<Runnable> callbacksOnCommit = Optional.empty(); | |
| private final AtomicReference<Runnable> callbacksOnCommit = Optional.empty(); |
There was a problem hiding this comment.
callbacksOnCommit isn't a final field–it need to be set in the below registerCallback method.
There was a problem hiding this comment.
@tdcmeehan sorry, just noticed this now — I initially misread your suggestion as final Optional<Runnable>. As you recommended, using final AtomicReference<Runnable> is undoubtedly a better choice, and I've already fixed it. Please take a look when you have a moment. Thanks a lot!
...iceberg/src/main/java/com/facebook/presto/iceberg/transaction/IcebergTransactionContext.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/transaction/IcebergTransactionContext.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/transaction/IcebergTransactionContext.java
Outdated
Show resolved
Hide resolved
b889d72 to
6a4eb25
Compare
|
@tdcmeehan thanks a lot for your review. I've addressed the comments, please take another look when you get a chance. |
Just getting back to this, apologies for the delay. Almost all the tests have passed, barring one that was canceled. Please resolve the file conflicts that have developed over the time, and then we can find how the tests are doing now! |
|
Just a ping that this PR has file conflicts needing resolution, but otherwise seems close to mergeability. |
92a9cfb to
51ab9ff
Compare
|
@steveburnett thanks for your reminder. I've rebased and resolved the conflicts, please take a look when you get a chance. |
And there's another file conflict now, in IcebergAbstractMetadata.java. When you have time, please resolve this new conflict to move the PR towards mergeability. |
85a70e8 to
30137d9
Compare
|
Thanks @steveburnett. I've rebased and resolved the conflicts. Please take a look when you get a chance. |
Looks good! I see no file conflict, and all checks have passed. |
Set Iceberg connector's isolation level to REPEATABLE_READ(SNAPSHOT).
30137d9 to
e7631a9
Compare
Description
Iceberg currently supports transactions at the table level via the single-table transaction API, which allows one or more updates to a single table in an atomic manner. For example:
Meanwhile, the Iceberg community is also working on supporting multi-table multi-write transactions, as referenced in this document and PR. According to the discussion in the document, the support of multi-statement read-write transactions should be a catalog-level implementation, and it also covers the choice of isolation levels and the definition of related behaviors in Iceberg. Currently, the implementation appears not so mature to support multi-table operations particularly at the SERIALIZABLE isolation level. However, its behavioral definition and implementation details for the SNAPSHOT isolation level can be fully applied to a catalog-based single-table transaction involving multiple write statements.
Based on this, this PR implements single-table multi-statement writes transactions in Presto Iceberg Connector, with the following design choices:
Supports multi-statement writes on a single table, while can read for multiple tables in the same transaction. All operations are guaranteed isolation-level visibility and atomic commit/rollback.
Currently, only REPEATABLE_READ(SNAPSHOT) isolation level is supported, as this is the most natural/easiest isolation level to implement at this stage. The SERIALIZABLE isolation level seems to have some remaining concerns -- we'd need to ensure it properly handles concurrent cases. Also, it would likely require a lot of code changes. Once the Iceberg community's implementation of multi-table transactions with SERIALIZABLE isolation matures, we can follow up with support for this isolation level as well.
Following the discussion in the Iceberg community's design documentation, we implement the following behavior to guarantee SNAPSHOT isolation level in multi-statement read-write transactions:
Note that under SNAPSHOT isolation a write skew anomaly is acceptable and permitted.
Supports
read your own writesvisibility within transactions, meaning subsequent operations can see the results of previous writes within the same transaction.Within multi-statement transactions, only operations manageable through Iceberg's single-table transaction API are supported. Certain operations (e.g. CREATE TABLE/DROP TABLE/RENAME TABLE etc.) cannot be managed through Iceberg's single-table transaction API. Therefore, currently they are unsuitable to be included in multi-statement transactions and should instead be executed in autocommit mode (autocommit=true). We referenced the test cases from Iceberg community's implementation here to determine which statements should be supported in a multi-statement read-write transaction.
Besides, due to limitations in the current testing framework, our test cases do not currently include validation of transaction isolation under concurrent execution. Once we adopt the enhancement for testing framework from PR #25053, we will be able to write the following test cases to test concurrent transactions:
Motivation and Context
Support single-table multi-statement writes transaction in Iceberg Connector
Impact
Test Plan
Contributor checklist