Skip to content

Commit d93ae38

Browse files
committed
gfs2: Check for log write errors before telling dlm to unlock
Before this patch, function do_xmote just assumed all the writes submitted to the journal were finished and successful, and it called the go_unlock function to release the dlm lock. But if they're not, and a revoke failed to make its way to the journal, a journal replay on another node will cause corruption if we let the go_inval function continue and tell dlm to release the glock to another node. This patch adds a couple checks for errors in do_xmote after the calls to go_sync and go_inval. If an error is found, we cannot withdraw yet, because the withdraw itself uses glocks to make the file system read-only. Instead, we flag the error. Later, asserts should cause another node to replay the journal before continuing, thus protecting rgrp and dinode glocks and maintaining the integrity of the metadata. Note that we only need to do this for journaled glocks. System glocks should be able to progress even under withdrawn conditions. Signed-off-by: Bob Peterson <[email protected]> Reviewed-by: Andreas Gruenbacher <[email protected]>
1 parent f05b86d commit d93ae38

File tree

1 file changed

+28
-3
lines changed

1 file changed

+28
-3
lines changed

fs/gfs2/glock.c

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -622,6 +622,32 @@ __acquires(&gl->gl_lockref.lock)
622622
}
623623

624624
gfs2_glock_hold(gl);
625+
/*
626+
* Check for an error encountered since we called go_sync and go_inval.
627+
* If so, we can't withdraw from the glock code because the withdraw
628+
* code itself uses glocks (see function signal_our_withdraw) to
629+
* change the mount to read-only. Most importantly, we must not call
630+
* dlm to unlock the glock until the journal is in a known good state
631+
* (after journal replay) otherwise other nodes may use the object
632+
* (rgrp or dinode) and then later, journal replay will corrupt the
633+
* file system. The best we can do here is wait for the logd daemon
634+
* to see sd_log_error and withdraw, and in the meantime, requeue the
635+
* work for later.
636+
*
637+
* However, if we're just unlocking the lock (say, for unmount, when
638+
* gfs2_gl_hash_clear calls clear_glock) and recovery is complete
639+
* then it's okay to tell dlm to unlock it.
640+
*/
641+
if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp)))
642+
gfs2_withdraw_delayed(sdp);
643+
if (glock_blocked_by_withdraw(gl)) {
644+
if (target != LM_ST_UNLOCKED ||
645+
test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags)) {
646+
gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
647+
goto out;
648+
}
649+
}
650+
625651
if (sdp->sd_lockstruct.ls_ops->lm_lock) {
626652
/* lock_dlm */
627653
ret = sdp->sd_lockstruct.ls_ops->lm_lock(gl, target, lck_flags);
@@ -630,16 +656,15 @@ __acquires(&gl->gl_lockref.lock)
630656
test_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags)) {
631657
finish_xmote(gl, target);
632658
gfs2_glock_queue_work(gl, 0);
633-
}
634-
else if (ret) {
659+
} else if (ret) {
635660
fs_err(sdp, "lm_lock ret %d\n", ret);
636661
GLOCK_BUG_ON(gl, !gfs2_withdrawn(sdp));
637662
}
638663
} else { /* lock_nolock */
639664
finish_xmote(gl, target);
640665
gfs2_glock_queue_work(gl, 0);
641666
}
642-
667+
out:
643668
spin_lock(&gl->gl_lockref.lock);
644669
}
645670

0 commit comments

Comments
 (0)