Skip to content

Commit 55a48ad

Browse files
YuKuai-huaweiliu-song-6
authored andcommitted
md: Don't ignore read-only array in md_check_recovery()
Usually if the array is not read-write, md_check_recovery() won't register new sync_thread in the first place. And if the array is read-write and sync_thread is registered, md_set_readonly() will unregister sync_thread before setting the array read-only. md/raid follow this behavior hence there is no problem. After commit f52f5c7 ("md: fix stopping sync thread"), following hang can be triggered by test shell/integrity-caching.sh: 1) array is read-only. dm-raid update super block: rs_update_sbs ro = mddev->ro mddev->ro = 0 -> set array read-write md_update_sb 2) register new sync thread concurrently. 3) dm-raid set array back to read-only: rs_update_sbs mddev->ro = ro 4) stop the array: raid_dtr md_stop stop_sync_thread set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_wakeup_thread_directly(mddev->sync_thread); wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) 5) sync thread done: md_do_sync set_bit(MD_RECOVERY_DONE, &mddev->recovery); md_wakeup_thread(mddev->thread); 6) daemon thread can't unregister sync thread: md_check_recovery if (!md_is_rdwr(mddev) && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) return; -> -> MD_RECOVERY_RUNNING can't be cleared, hence step 4 hang; The root cause is that dm-raid manipulate 'mddev->ro' by itself, however, dm-raid really should stop sync thread before setting the array read-only. Unfortunately, I need to read more code before I can refacter the handler of 'mddev->ro' in dm-raid, hence let's fix the problem the easy way for now to prevent dm-raid regression. Reported-by: Mikulas Patocka <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Fixes: ecbfb9f ("dm raid: add raid level takeover support") Fixes: f52f5c7 ("md: fix stopping sync thread") Cc: [email protected] # v6.7+ Signed-off-by: Yu Kuai <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 1baae05 commit 55a48ad

File tree

1 file changed

+18
-13
lines changed

1 file changed

+18
-13
lines changed

drivers/md/md.c

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9449,6 +9449,20 @@ static void md_start_sync(struct work_struct *ws)
94499449
sysfs_notify_dirent_safe(mddev->sysfs_action);
94509450
}
94519451

9452+
static void unregister_sync_thread(struct mddev *mddev)
9453+
{
9454+
if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
9455+
/* resync/recovery still happening */
9456+
clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
9457+
return;
9458+
}
9459+
9460+
if (WARN_ON_ONCE(!mddev->sync_thread))
9461+
return;
9462+
9463+
md_reap_sync_thread(mddev);
9464+
}
9465+
94529466
/*
94539467
* This routine is regularly called by all per-raid-array threads to
94549468
* deal with generic issues like resync and super-block update.
@@ -9486,7 +9500,8 @@ void md_check_recovery(struct mddev *mddev)
94869500
}
94879501

94889502
if (!md_is_rdwr(mddev) &&
9489-
!test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
9503+
!test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
9504+
!test_bit(MD_RECOVERY_DONE, &mddev->recovery))
94909505
return;
94919506
if ( ! (
94929507
(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
@@ -9508,8 +9523,7 @@ void md_check_recovery(struct mddev *mddev)
95089523
struct md_rdev *rdev;
95099524

95109525
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
9511-
/* sync_work already queued. */
9512-
clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
9526+
unregister_sync_thread(mddev);
95139527
goto unlock;
95149528
}
95159529

@@ -9572,16 +9586,7 @@ void md_check_recovery(struct mddev *mddev)
95729586
* still set.
95739587
*/
95749588
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
9575-
if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
9576-
/* resync/recovery still happening */
9577-
clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
9578-
goto unlock;
9579-
}
9580-
9581-
if (WARN_ON_ONCE(!mddev->sync_thread))
9582-
goto unlock;
9583-
9584-
md_reap_sync_thread(mddev);
9589+
unregister_sync_thread(mddev);
95859590
goto unlock;
95869591
}
95879592

0 commit comments

Comments
 (0)