Skip to content

Commit f8d08e0

Browse files
BenHuddlestontrondn
authored andcommitted
MB-34445: RespondAmbiguous task should not own a vBucket.
This causes deadlock due to recursive locking of tMutex in ExecutorPool if the task is the last thing that owns a vBucket and is attempting to schedule deferred deletion. Fix this by holding a weak pointer instead. If we promote the pointer then we are running normally and won't have previously acquired tMutex. If we are cancelling the task at destruction of the engine, we will not attempt to delete the vBucket. 15:05:37 #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 15:05:38 #1 0x00007f551159fdbd in __GI___pthread_mutex_lock (mutex=0x7f5510435b08) at ../nptl/pthread_mutex_lock.c:80 15:05:38 #2 0x00007f550af2ef9d in __gthread_mutex_lock (__mutex=0x7f5510435b08) at /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748 15:05:38 #3 std::mutex::lock (this=0x7f5510435b08) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:103 15:05:38 #4 std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:162 15:05:38 #5 ExecutorPool::_schedule (this=this@entry=0x7f5510435a00, task=...) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/executorpool.cc:420 15:05:38 #6 0x00007f550af2f13d in ExecutorPool::schedule (this=0x7f5510435a00, task=...) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/executorpool.cc:440 15:05:40 #7 0x00007f550af2ad1d in EphemeralVBucket::scheduleDeferredDeletion (this=<optimized out>, engine=...) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/ephemeral_vb.cc:841 15:05:40 #8 0x00007f550af64dc1 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f550a3a9060) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:154 15:05:40 #9 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:684 15:05:40 #10 std::__shared_ptr<VBucket, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1123 15:05:40 #11 std::shared_ptr<VBucket>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr.h:93 15:05:40 #12 RespondAmbiguousNotification::~RespondAmbiguousNotification (this=<optimized out>, __in_chrg=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/kv_bucket.cc:226 15:05:40 #13 0x00007f550af30849 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f550a41c340) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:154 15:05:40 #14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:684 15:05:40 #15 std::__shared_ptr<GlobalTask, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7ffe6ec46730, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1123 15:05:40 #16 std::shared_ptr<GlobalTask>::~shared_ptr (this=0x7ffe6ec46730, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr.h:93 15:05:40 #17 ExecutorPool::_stopTaskGroup (this=<optimized out>, taskGID=140003322306560, taskType=<optimized out>, force=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/executorpool.cc:612 15:05:40 #18 0x00007f550af56cb8 in KVBucket::deinitialize (this=0x7f5510495000) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/kv_bucket.cc:466 15:05:40 #19 0x00007f550af09ee1 in EventuallyPersistentEngine::~EventuallyPersistentEngine (this=0x7f55104b2000, __in_chrg=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/ep_engine.cc:6073 15:05:40 #20 0x00007f550af0a0e7 in EventuallyPersistentEngine::~EventuallyPersistentEngine (this=0x7f55104b2000, __in_chrg=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/ep_engine.cc:6079 15:05:40 #21 EventuallyPersistentEngine::destroy (this=0x7f55104b2000, force=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/ep_engine.cc:155 15:05:40 #22 0x000000000040ff0b in MockTestHarness::destroy_bucket (force=false, handle=0x7f55104f2aa0, this=0x64c740 <harness>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/programs/engine_testapp/engine_testapp.cc:1178 15:05:40 #23 execute_test (default_cfg=<optimized out>, engine=<optimized out>, test=...) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/programs/engine_testapp/engine_testapp.cc:1333 15:05:40 #24 main (argc=<optimized out>, argv=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/programs/engine_testapp/engine_testapp.cc:1581 Change-Id: I70298a8337967c648280b0d86a96c08bf3a4008a Reviewed-on: http://review.couchbase.org/110146 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
1 parent 3303390 commit f8d08e0

File tree

2 files changed

+77
-3
lines changed

2 files changed

+77
-3
lines changed

engines/ep/src/kv_bucket.cc

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -229,10 +229,10 @@ class RespondAmbiguousNotification : public GlobalTask {
229229
VBucketPtr& vb,
230230
std::vector<const void*> cookies)
231231
: GlobalTask(&e, TaskId::RespondAmbiguousNotification, 0, false),
232-
vbucket(vb),
232+
weakVb(vb),
233233
cookies(cookies),
234234
description("Notify clients of Sync Write Ambiguous " +
235-
vbucket->getId().to_string()) {
235+
vb->getId().to_string()) {
236236
for (const auto* cookie : cookies) {
237237
if (!cookie) {
238238
throw std::invalid_argument(
@@ -252,19 +252,26 @@ class RespondAmbiguousNotification : public GlobalTask {
252252
}
253253

254254
bool run(void) {
255+
auto vbucket = weakVb.lock();
256+
if (!vbucket) {
257+
return false;
258+
}
259+
255260
TRACE_EVENT1("ep-engine/task",
256261
"RespondAmbiguousNotification",
257262
"vb",
258263
(vbucket->getId()).get());
264+
259265
for (const auto* cookie : cookies) {
260266
vbucket->notifyClientOfSyncWriteComplete(
261267
cookie, ENGINE_SYNC_WRITE_AMBIGUOUS);
262268
}
269+
263270
return false;
264271
}
265272

266273
private:
267-
VBucketPtr vbucket;
274+
std::weak_ptr<VBucket> weakVb;
268275
std::vector<const void*> cookies;
269276
const std::string description;
270277
};

engines/ep/tests/module_tests/evp_store_durability_test.cc

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,17 @@ class DurabilityEphemeralBucketTest : public STParameterizedBucketTest {
7575
void testPurgeCompletedPrepare(F& func);
7676
};
7777

78+
/// Note - not single-threaded
79+
class DurabilityRespondAmbiguousTest : public KVBucketTest {
80+
protected:
81+
void SetUp() override {
82+
// The test should do the SetUp
83+
}
84+
void TearDown() override {
85+
// The test should do the TearDown
86+
};
87+
};
88+
7889
void DurabilityEPBucketTest::testPersistPrepare(DocumentState docState) {
7990
setVBucketStateAndRunPersistTask(
8091
vbid,
@@ -840,6 +851,62 @@ TEST_P(DurabilityBucketTest, TakeoverSendsDurabilityAmbiguous) {
840851
EXPECT_EQ(ENGINE_SYNC_WRITE_AMBIGUOUS, mockCookie->status);
841852
}
842853

854+
TEST_F(DurabilityRespondAmbiguousTest, RespondAmbiguousNotificationDeadLock) {
855+
// Anecdotally this takes between 0.5 and 1s to run on my dev machine
856+
// (MB Pro 2017 - PCIe SSD). The test typically hits the issue on the 1st
857+
// run but sometimes takes up to 5. I didn't want to increase the number
858+
// of iterations as the test will obviously take far longer to run. If
859+
// this test ever causes a timeout - a deadlock issue (probably in the
860+
// RespondAmbiguousNotification task) is present.
861+
for (int i = 0; i < 100; i++) {
862+
KVBucketTest::SetUp();
863+
864+
EXPECT_EQ(ENGINE_SUCCESS,
865+
store->setVBucketState(
866+
vbid,
867+
vbucket_state_active,
868+
{{"topology",
869+
nlohmann::json::array({{"active", "replica"}})}}));
870+
871+
auto key = makeStoredDocKey("key");
872+
using namespace cb::durability;
873+
auto pending = makePendingItem(key, "value");
874+
875+
// Store it
876+
EXPECT_EQ(ENGINE_EWOULDBLOCK, store->set(*pending, cookie));
877+
878+
// We don't send EWOULDBLOCK to clients
879+
auto mockCookie = cookie_to_mock_object(cookie);
880+
EXPECT_EQ(ENGINE_SUCCESS, mockCookie->status);
881+
882+
// Set state to dead - this will schedule the task
883+
EXPECT_EQ(ENGINE_SUCCESS,
884+
store->setVBucketState(vbid, vbucket_state_dead));
885+
886+
// Deleting the vBucket will set the deferred deletion flag that
887+
// causes deadlock when the RespondAmbiguousNotification task is
888+
// destroyed as part of shutdown but is the last owner of the vBucket
889+
// (attempts to schedule destruction and tries to recursively lock a
890+
// mutex)
891+
{
892+
auto ptr = store->getVBucket(vbid);
893+
store->deleteVBucket(vbid, nullptr);
894+
}
895+
896+
destroy_mock_event_callbacks();
897+
engine->getDcpConnMap().manageConnections();
898+
899+
// Should deadlock here in ~SynchronousEPEngine
900+
engine.reset();
901+
902+
// The RespondAmbiguousNotification task requires our cookie to still be
903+
// valid so delete it only after it has been destroyed
904+
destroy_mock_cookie(cookie);
905+
906+
ExecutorPool::shutdown();
907+
}
908+
}
909+
843910
// Test that if a SyncWrite times out, then a subsequent SyncWrite which
844911
// _should_ fail does indeed fail.
845912
// (Regression test for part of MB-34367 - after using notify_IO_complete

0 commit comments

Comments
 (0)