Skip to content

Commit 18b0185

Browse files
committed
librbd/ManagedLock: kickstart ExclusiveLock state machine
... that is stalled waiting for lock. Do this when trying to reacquire lock in the ImageWatcher's rewatch mechanism. This would enable the ExclusiveLock state machine to propagate the blocklist error to the caller trying to perform an image operation requiring an exclusive lock. Previous attempt, e66db76, to fix the hang due to exclusive lock acquisiton (stuck waiting for lock) racing with client blocklisting did not always work. e66db76 kickstarted the ExclusiveLock state machine when the ImageWatcher tried to schedule a exclusive lock request and the blocklisting was detected. However, there is a short window between a watch getting deregistered and client blocklisting getting detected as part of rewatching. If hit when trying to schedule a lock request, the ExclusiveLock state machine wasn't kickstarted, blocklist error wasn't propagated, and the hang resurfaced. A more robust approach is taken to resume the ExclusiveLock state machine stuck waiting for lock during client blocklisting. Whenever a client's ImageWatcher loses connection to the cluster, as it happens during blocklising, the ImageWatcher initiates a mechanism to rewatch the image and tries to reacquire the lock. Piggyback on this rewatch mechanism that gets triggered during client blocklisting. And when trying to reacquire the lock, kickstart the ExclusiveLock state machine stalled waiting for lock (STATE_WAITING_FOR_LOCK). Fixes: https://tracker.ceph.com/issues/63009 Signed-off-by: Ramana Raja <[email protected]>
1 parent 9fedc1e commit 18b0185

File tree

2 files changed

+3
-7
lines changed

2 files changed

+3
-7
lines changed

src/librbd/ImageWatcher.cc

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -594,10 +594,6 @@ void ImageWatcher<I>::schedule_request_lock(bool use_timer, int timer_delay) {
594594
} else {
595595
m_task_finisher->queue(TASK_CODE_REQUEST_LOCK, ctx);
596596
}
597-
} else if (is_blocklisted()) {
598-
lderr(m_image_ctx.cct) << this << " blocklisted waiting for exclusive lock"
599-
<< dendl;
600-
m_image_ctx.exclusive_lock->handle_peer_notification(0);
601597
}
602598
}
603599

src/librbd/ManagedLock.cc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,8 @@ void ManagedLock<I>::reacquire_lock(Context *on_reacquired) {
207207
{
208208
std::lock_guard locker{m_lock};
209209

210-
if (m_state == STATE_WAITING_FOR_REGISTER) {
210+
if (m_state == STATE_WAITING_FOR_REGISTER ||
211+
m_state == STATE_WAITING_FOR_LOCK) {
211212
// restart the acquire lock process now that watch is valid
212213
ldout(m_cct, 10) << "woke up waiting (re)acquire" << dendl;
213214
Action active_action = get_active_action();
@@ -217,8 +218,7 @@ void ManagedLock<I>::reacquire_lock(Context *on_reacquired) {
217218
} else if (!is_state_shutdown() &&
218219
(m_state == STATE_LOCKED ||
219220
m_state == STATE_ACQUIRING ||
220-
m_state == STATE_POST_ACQUIRING ||
221-
m_state == STATE_WAITING_FOR_LOCK)) {
221+
m_state == STATE_POST_ACQUIRING)) {
222222
// interlock the lock operation with other state ops
223223
ldout(m_cct, 10) << dendl;
224224
execute_action(ACTION_REACQUIRE_LOCK, on_reacquired);

0 commit comments

Comments
 (0)