Skip to content

Add rollback action for CTAS destination table#27702

Merged
ebyhr merged 1 commit intotrinodb:masterfrom
findinpath:findinpath/retry-on-drop-table
Dec 24, 2025
Merged

Add rollback action for CTAS destination table#27702
ebyhr merged 1 commit intotrinodb:masterfrom
findinpath:findinpath/retry-on-drop-table

Conversation

@findinpath
Copy link
Contributor

@findinpath findinpath commented Dec 19, 2025

Description

While CREATE TABLE is an atomic operation, CREATE TABLE AS SELECT statement happens in two phases
for JDBC connectors from Trino:

  • creating the table
  • inserting content into the table

If the insertion of the data fails, the CTAS may leave residual empty tables behind.

The JDBC connectors, in dealing with CREATE TABLE AS SELECT scenarios,
remove the destination table in case of dealing with exceptions
while inserting the content.

NOTE that introducing the rollbackConsumer in the beginCreateTable method may come with potential changes in the functionality of the connectors that extend from trino-base-jdbc module.

Additional context and related issues

Here is a relevant stacktrace gathered from CI that showcases the issue of leaving behind empty tables after a CTAS

Error:  ....plugin.redshift.TestRedshiftConnectorSmokeTest.testRowLevelDelete -- Time elapsed: 13.15 s <<< ERROR!
io.trino.testing.QueryFailedException: line 1:1: Destination table 'redshift.test_schema.test_row_level_deletepbx2vn9g2n' already exists
	at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:138)
	at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:645)
	at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:621)
	at io.trino.plugin.redshift.TrinoSqlExecutorWithRetries.lambda$execute$1(TrinoSqlExecutorWithRetries.java:52)
	at dev.failsafe.Functions.lambda$toCtxSupplier$9(Functions.java:228)
	at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
	at dev.failsafe.FailsafeExecutor.run(FailsafeExecutor.java:220)
	at io.trino.plugin.redshift.TrinoSqlExecutorWithRetries.execute(TrinoSqlExecutorWithRetries.java:52)
	at io.trino.testing.sql.TestTable.createAndInsert(TestTable.java:52)
	at io.trino.testing.sql.TestTable.<init>(TestTable.java:47)
	at io.trino.plugin.redshift.TestRedshiftConnectorSmokeTest.newTrinoTable(TestRedshiftConnectorSmokeTest.java:47)
	at io.trino.testing.AbstractTestQueryFramework.newTrinoTable(AbstractTestQueryFramework.java:700)
	at io.trino.testing.BaseConnectorSmokeTest.testRowLevelDelete(BaseConnectorSmokeTest.java:291)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1490)
	at java.base/java.util.concurrent.ForkJoinPool.helpJoin(ForkJoinPool.java:2248)
	at java.base/java.util.concurrent.ForkJoinTask.awaitDone(ForkJoinTask.java:499)
	at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:666)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1450)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2019)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:187)
	Suppressed: java.lang.Exception: SQL: CREATE TABLE test_row_level_deletepbx2vn9g2n AS SELECT * FROM region
		at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:652)
		... 24 more
Caused by: io.trino.spi.TrinoException: line 1:1: Destination table 'redshift.test_schema.test_row_level_deletepbx2vn9g2n' already exists
	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:58)
	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:52)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitCreateTableAsSelect(StatementAnalyzer.java:960)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitCreateTableAsSelect(StatementAnalyzer.java:533)
	at io.trino.sql.tree.CreateTableAsSelect.accept(CreateTableAsSelect.java:86)
	at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:552)
	at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:512)
	at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:501)
	at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:98)
	at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:87)
	at io.trino.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:332)
	at io.trino.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:241)
	at io.trino.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:984)
	at io.trino.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:160)
	at io.trino.$gen.Trino_testversion____20251208_175533_1.call(Unknown Source)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:128)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:80)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
	at java.base/java.lang.Thread.run(Thread.java:1474)

Relevant code from the method testRowLevelDelete() in BaseConnectorSmokeTest.java:

try (TestTable table = newTrinoTable("test_row_level_delete", "AS SELECT * FROM region")) {

Relevant issues and PRs

Inspired from the discussions done during #27677

Release notes

() This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## JDBC
* Drop the table in CTAS scenario in case of an exception during data insertion phase. ({issue}`issuenumber`)

@findinpath findinpath force-pushed the findinpath/retry-on-drop-table branch 2 times, most recently from 022746f to d099899 Compare December 19, 2025 15:57
@findinpath findinpath changed the title Retry dropping a table Add rollback action for CTAS destination table Dec 19, 2025
@findinpath findinpath force-pushed the findinpath/retry-on-drop-table branch from d099899 to fb25fcf Compare December 23, 2025 07:24
@github-actions github-actions bot added druid Druid connector exasol Exasol connector ignite Ignite connector sqlserver SQLServer connector labels Dec 23, 2025
@findinpath findinpath force-pushed the findinpath/retry-on-drop-table branch from fb25fcf to b3142de Compare December 23, 2025 08:33
@findinpath findinpath force-pushed the findinpath/retry-on-drop-table branch from b3142de to 540828e Compare December 23, 2025 08:46
@github-actions github-actions bot added mysql MySQL connector postgresql PostgreSQL connector labels Dec 23, 2025
@findinpath findinpath force-pushed the findinpath/retry-on-drop-table branch from 540828e to a1e5a0c Compare December 23, 2025 11:06
Copy link
Contributor

@chenjian2664 chenjian2664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

@findinpath findinpath force-pushed the findinpath/retry-on-drop-table branch from a1e5a0c to 88664e3 Compare December 23, 2025 15:13
@findinpath findinpath force-pushed the findinpath/retry-on-drop-table branch from 88664e3 to 5ae171b Compare December 24, 2025 07:07
@findinpath findinpath force-pushed the findinpath/retry-on-drop-table branch from 5ae171b to 007ac15 Compare December 24, 2025 07:33
While CREATE TABLE is an atomic operation,
CREATE TABLE AS SELECT statement happens in two phases
for JDBC connectors from Trino:

- creating the table
- inserting content into the table

If the insertion of the data fails, the CTAS may leave
residual empty tables behind.

The JDBC connectors, in dealing
with CREATE TABLE AS SELECT scenarios,
remove the destination table in case of dealing with exceptions
while inserting the content.
@ebyhr ebyhr force-pushed the findinpath/retry-on-drop-table branch from 007ac15 to 112deed Compare December 24, 2025 07:58
@ebyhr ebyhr merged commit ac78011 into trinodb:master Dec 24, 2025
66 of 67 checks passed
@github-actions github-actions bot added this to the 480 milestone Dec 24, 2025
@chenjian2664
Copy link
Contributor

chenjian2664 commented Dec 27, 2025

Adding the release note into ClickHouse, DuckDB, Ignite, MariaDB, MySQL, Oracle, PostgreSQL, Redshift, SingleStore, Snowflake, SQL Server, Vertica. let me know if you think there are connectors I should add the RN as well

}
}

protected void rollbackCreateDestinationTable(ConnectorSession session, RemoteTableName remoteTableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollbackTableCreation / rollbackCreateTable

}
JdbcOutputTableHandle handle = jdbcClient.beginCreateTable(session, tableMetadata);
JdbcOutputTableHandle handle = jdbcClient.beginCreateTable(session, tableMetadata, rollbackActions::add);
rollbackActions.add(() -> jdbcClient.rollbackCreateTable(session, handle));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had rollback here. do we rollback twice now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rollbackCreateTable is mainly for the clean ups of the temporary tables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the responsibilities are now blurred, aren't they?
maybe we can clarify them with better naming or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the responsibilities are now blurred, aren't they?

now, yes. But I think the name is ok since we are inside the "createTable" method, that's is fair to have a rollback method like "rollbackCreateTable", it just our implementation only do cleanups of temporary table. Ideally if we could do the cleanup of target table inside this method as well

@findinpath #27848 the method is going to split such cleanups into two methods(we already do), but finally, I think that's more elegant that we should only has one rollback methods for "createTable", caller don't care or needs to know the cleanups table is "temporary" or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While rolling back the creation of the destination table is characteristic only for CTAS statements, rolling back the creation of a temporary table can occur on any DML statement.
I see a good rationale on having two separate methods for doing rollback for table creation.

throw new TrinoException(NOT_SUPPORTED, "This connector does not support replacing tables");
}
igniteClient.beginCreateTable(session, tableMetadata);
igniteClient.beginCreateTable(session, tableMetadata, _ -> {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document why it's ok to ignore a rollback action
(it should be actually safer not to ignore -- this is CREATE TABLE flow. if nothing follows CREATE TABLE, nothing fails later and no rollback will take place)

void createTable(ConnectorSession session, ConnectorTableMetadata tableMetadata);

JdbcOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata);
JdbcOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Consumer<Runnable> rollbackActionConsumer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollbackActionCollector would be a better name

more importantly, this is a rather odd abstraction and doesn't seem necessary.
so far it was JdbcMetadata who was respomnsible for instantiating and remembering rollback actions. let's not change that, if we don't have to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollbackActionCollector would be a better name

+1

so far it was JdbcMetadata who was respomnsible for instantiating and remembering rollback actions

We are entering a situation where it is unclear when it is safe to roll back the target table during CTAS. For example, in BaseJdbcClient#beginCreateTable, when FTE is enabled, both the target table and a temporary table are created in the method, during rollback in DefaultJdbcMetadata, we cannot reliably determine which tables should be cleaned up

let's not change that, if we don't have to

For now, this works as a solution, but I believe we could evolve the API to shift the responsibility back to DefaultJdbcMetadata. For example, we could change the method's return value so that DefaultJdbcMetadata knows exactly which tables need to be cleaned up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollbackActionCollector would be a better name

+1

Included in

however, the name may still be imperfect -- #27702 (comment)

in BaseJdbcClient#beginCreateTable, when FTE is enabled, both the target table and a temporary table are created in the method

I made the same mistake. It's not FTE-related despite the code making it look so.

during rollback in DefaultJdbcMetadata, we cannot reliably determine which tables should be cleaned up

It's unclear why destination table is created there. I think it might be possible to defer that statement until later.

let's not change that, if we don't have to

For now, this works as a solution, but I believe we could evolve the API to shift the responsibility back to DefaultJdbcMetadata. For example, we could change the method's return value so that DefaultJdbcMetadata knows exactly which tables need to be cleaned up

That was also my intention.
I stopped short from execution on it in

when i saw the beginMerge flow. We create high number of tables there, and someone needs to clean them up if we fail mid-way.

@findinpath
Copy link
Contributor Author

Following up the open discussions in #27848

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed druid Druid connector exasol Exasol connector ignite Ignite connector mysql MySQL connector postgresql PostgreSQL connector sqlserver SQLServer connector

Development

Successfully merging this pull request may close these issues.

4 participants