Skip to content

Commit b788847

Browse files
committed
[#28889] xClusterDDLRepl: Fix IncrementalSafeTimeBumpWithDdlQueueStepdowns in ASAN
Summary: After introducing advisory locks to ddl_queue_handler, only one handler can be active at once. This test creates a handler, pauses (which deletes the handler), then resumes (which recreates the handler). When the handler is deleted, it takes a bit for the pg session to shutdown, which means that the advisory lock is held on for an extended period of time. During this period the next handler can't grab the advisory lock, so it can't make progress. In build types that are more time sensitive (eg ASAN) , this causes checks to occur before the new handler is ready. Fix is to try to release all advisory locks when shutting down the ddl queue handler. Jira: DB-18611 Test Plan: ``` ybd asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.IncrementalSafeTimeBumpWithDdlQueueStepdowns -n 40 ``` Reviewers: hsunder, xCluster Reviewed By: hsunder Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D47362
1 parent 7e6a0df commit b788847

File tree

3 files changed

+17
-0
lines changed

3 files changed

+17
-0
lines changed

src/yb/tserver/xcluster_ddl_queue_handler.cc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,17 @@ XClusterDDLQueueHandler::XClusterDDLQueueHandler(
310310

311311
XClusterDDLQueueHandler::~XClusterDDLQueueHandler() {}
312312

313+
void XClusterDDLQueueHandler::Shutdown() {
314+
if (pg_conn_ && FLAGS_ysql_yb_enable_advisory_locks &&
315+
FLAGS_xcluster_ddl_queue_advisory_lock_key != 0) {
316+
// Optimistically unlock the advisory lock so we don't have to wait for the connection to close.
317+
auto s = pg_conn_->Execute(Format("SELECT pg_advisory_unlock_all()"));
318+
// Alright if we fail here, log an error and wait for the connection to close normally.
319+
WARN_NOT_OK(s, "Encountered error unlocking advisory lock for xCluster DDL queue handler");
320+
pg_conn_.reset();
321+
}
322+
}
323+
313324
Status XClusterDDLQueueHandler::ExecuteCommittedDDLs() {
314325
SCHECK(safe_time_batch_, InternalError, "Safe time batch is not initialized");
315326
if (!safe_time_batch_->IsComplete()) {

src/yb/tserver/xcluster_ddl_queue_handler.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ class XClusterDDLQueueHandler {
7979
ConnectToPostgresFunc connect_to_pg_func, UpdateSafeTimeFunc update_safe_time_func);
8080
virtual ~XClusterDDLQueueHandler();
8181

82+
void Shutdown();
83+
8284
// This function is called before the poller calls GetChanges. This will detect if we are in the
8385
// middle of a executing a DDL batch and complete it.
8486
Status ProcessPendingBatchIfExists();

src/yb/tserver/xcluster_poller.cc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,10 @@ void XClusterPoller::CompleteShutdown() {
215215
if (output_client_) {
216216
output_client_->CompleteShutdown();
217217
}
218+
if (ddl_queue_handler_) {
219+
// Only clean up the ddl queue handler after all other tasks have completed.
220+
ddl_queue_handler_->Shutdown();
221+
}
218222

219223
XClusterAsyncExecutor::CompleteShutdown();
220224

0 commit comments

Comments
 (0)