Skip to content

Commit 257ac23

Browse files
YuKuai-huaweiliu-song-6
authored andcommitted
md/raid1: fix choose next idle in read_balance()
Commit 12cee5a ("md/raid1: prevent merging too large request") add the case choose next idle in read_balance(): read_balance: for_each_rdev if(next_seq_sect == this_sector || dist == 0) -> sequential reads best_disk = disk; if (...) choose_next_idle = 1 continue; for_each_rdev -> iterate next rdev if (pending == 0) best_disk = disk; -> choose the next idle disk break; if (choose_next_idle) -> keep using this rdev if there are no other idle disk contine However, commit 2e52d44 ("md/raid1: add failfast handling for reads.") remove the code: - /* If device is idle, use it */ - if (pending == 0) { - best_disk = disk; - break; - } Hence choose next idle will never work now, fix this problem by following: 1) don't set best_disk in this case, read_balance() will choose the best disk after iterating all the disks; 2) add 'pending' so that other idle disk will be chosen; 3) add a new local variable 'sequential_disk' to record the disk, and if there is no other idle disk, 'sequential_disk' will be chosen; Fixes: 2e52d44 ("md/raid1: add failfast handling for reads.") Co-developed-by: Paul Luse <[email protected]> Signed-off-by: Paul Luse <[email protected]> Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Xiao Ni <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 2c27d09 commit 257ac23

File tree

1 file changed

+22
-10
lines changed

1 file changed

+22
-10
lines changed

drivers/md/raid1.c

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -598,13 +598,12 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
598598
const sector_t this_sector = r1_bio->sector;
599599
int sectors;
600600
int best_good_sectors;
601-
int best_disk, best_dist_disk, best_pending_disk;
601+
int best_disk, best_dist_disk, best_pending_disk, sequential_disk;
602602
int disk;
603603
sector_t best_dist;
604604
unsigned int min_pending;
605605
struct md_rdev *rdev;
606606
int choose_first;
607-
int choose_next_idle;
608607

609608
/*
610609
* Check if we can balance. We can balance on the whole
@@ -615,11 +614,11 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
615614
sectors = r1_bio->sectors;
616615
best_disk = -1;
617616
best_dist_disk = -1;
617+
sequential_disk = -1;
618618
best_dist = MaxSector;
619619
best_pending_disk = -1;
620620
min_pending = UINT_MAX;
621621
best_good_sectors = 0;
622-
choose_next_idle = 0;
623622
clear_bit(R1BIO_FailFast, &r1_bio->state);
624623

625624
if ((conf->mddev->recovery_cp < this_sector + sectors) ||
@@ -712,7 +711,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
712711
int opt_iosize = bdev_io_opt(rdev->bdev) >> 9;
713712
struct raid1_info *mirror = &conf->mirrors[disk];
714713

715-
best_disk = disk;
716714
/*
717715
* If buffered sequential IO size exceeds optimal
718716
* iosize, check if there is idle disk. If yes, choose
@@ -731,15 +729,22 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
731729
mirror->next_seq_sect > opt_iosize &&
732730
mirror->next_seq_sect - opt_iosize >=
733731
mirror->seq_start) {
734-
choose_next_idle = 1;
735-
continue;
732+
/*
733+
* Add 'pending' to avoid choosing this disk if
734+
* there is other idle disk.
735+
*/
736+
pending++;
737+
/*
738+
* If there is no other idle disk, this disk
739+
* will be chosen.
740+
*/
741+
sequential_disk = disk;
742+
} else {
743+
best_disk = disk;
744+
break;
736745
}
737-
break;
738746
}
739747

740-
if (choose_next_idle)
741-
continue;
742-
743748
if (min_pending > pending) {
744749
min_pending = pending;
745750
best_pending_disk = disk;
@@ -751,6 +756,13 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
751756
}
752757
}
753758

759+
/*
760+
* sequential IO size exceeds optimal iosize, however, there is no other
761+
* idle disk, so choose the sequential disk.
762+
*/
763+
if (best_disk == -1 && min_pending != 0)
764+
best_disk = sequential_disk;
765+
754766
/*
755767
* If all disks are rotational, choose the closest disk. If any disk is
756768
* non-rotational, choose the disk with less pending request even the

0 commit comments

Comments
 (0)