Skip to content

Commit e81faa9

Browse files
committed
Merge branch 'raid1-read_balance' into md-6.9
From: Yu Kuai <[email protected]> Co-developed-by: Paul Luse <[email protected]> The original idea is that Paul want to optimize raid1 read performance([1]), however, we think that the original code for read_balance() is quite complex, and we don't want to add more complexity. Hence we decide to refactor read_balance() first, to make code cleaner and easier for follow up. Before this patchset, read_balance() has many local variables and many branches, it want to consider all the scenarios in one iteration. The idea of this patch is to divide them into 4 different steps: 1) If resync is in progress, find the first usable disk, patch 5; Otherwise: 2) Loop through all disks and skipping slow disks and disks with bad blocks, choose the best disk, patch 10. If no disk is found: 3) Look for disks with bad blocks and choose the one with most number of sectors, patch 8. If no disk is found: 4) Choose first found slow disk with no bad blocks, or slow disk with most number of sectors, patch 7. Note that step 3) and step 4) are super code path, and performance should not be considered. And after this patchset, we'll continue to optimize read_balance for step 2), specifically how to choose the best rdev to read. [1] https://lore.kernel.org/all/[email protected]/ Yu Kuai (11): md: add a new helper rdev_has_badblock() md/raid1: factor out helpers to add rdev to conf md/raid1: record nonrot rdevs while adding/removing rdevs to conf md/raid1: fix choose next idle in read_balance() md/raid1-10: add a helper raid1_check_read_range() md/raid1-10: factor out a new helper raid1_should_read_first() md/raid1: factor out read_first_rdev() from read_balance() md/raid1: factor out choose_slow_rdev() from read_balance() md/raid1: factor out choose_bb_rdev() from read_balance() md/raid1: factor out the code to manage sequential IO md/raid1: factor out helpers to choose the best rdev from read_balance()
2 parents dfd2bf4 + 0091c5a commit e81faa9

File tree

6 files changed

+444
-280
lines changed

6 files changed

+444
-280
lines changed

drivers/md/md.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,7 @@ enum flag_bits {
207207
* check if there is collision between raid1
208208
* serial bios.
209209
*/
210+
Nonrot, /* non-rotational device (SSD) */
210211
};
211212

212213
static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors,
@@ -222,6 +223,16 @@ static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors,
222223
}
223224
return 0;
224225
}
226+
227+
static inline int rdev_has_badblock(struct md_rdev *rdev, sector_t s,
228+
int sectors)
229+
{
230+
sector_t first_bad;
231+
int bad_sectors;
232+
233+
return is_badblock(rdev, s, sectors, &first_bad, &bad_sectors);
234+
}
235+
225236
extern int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
226237
int is_new);
227238
extern int rdev_clear_badblocks(struct md_rdev *rdev, sector_t s, int sectors,

drivers/md/raid1-10.c

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -227,3 +227,72 @@ static inline bool exceed_read_errors(struct mddev *mddev, struct md_rdev *rdev)
227227

228228
return false;
229229
}
230+
231+
/**
232+
* raid1_check_read_range() - check a given read range for bad blocks,
233+
* available read length is returned;
234+
* @rdev: the rdev to read;
235+
* @this_sector: read position;
236+
* @len: read length;
237+
*
238+
* helper function for read_balance()
239+
*
240+
* 1) If there are no bad blocks in the range, @len is returned;
241+
* 2) If the range are all bad blocks, 0 is returned;
242+
* 3) If there are partial bad blocks:
243+
* - If the bad block range starts after @this_sector, the length of first
244+
* good region is returned;
245+
* - If the bad block range starts before @this_sector, 0 is returned and
246+
* the @len is updated to the offset into the region before we get to the
247+
* good blocks;
248+
*/
249+
static inline int raid1_check_read_range(struct md_rdev *rdev,
250+
sector_t this_sector, int *len)
251+
{
252+
sector_t first_bad;
253+
int bad_sectors;
254+
255+
/* no bad block overlap */
256+
if (!is_badblock(rdev, this_sector, *len, &first_bad, &bad_sectors))
257+
return *len;
258+
259+
/*
260+
* bad block range starts offset into our range so we can return the
261+
* number of sectors before the bad blocks start.
262+
*/
263+
if (first_bad > this_sector)
264+
return first_bad - this_sector;
265+
266+
/* read range is fully consumed by bad blocks. */
267+
if (this_sector + *len <= first_bad + bad_sectors)
268+
return 0;
269+
270+
/*
271+
* final case, bad block range starts before or at the start of our
272+
* range but does not cover our entire range so we still return 0 but
273+
* update the length with the number of sectors before we get to the
274+
* good ones.
275+
*/
276+
*len = first_bad + bad_sectors - this_sector;
277+
return 0;
278+
}
279+
280+
/*
281+
* Check if read should choose the first rdev.
282+
*
283+
* Balance on the whole device if no resync is going on (recovery is ok) or
284+
* below the resync window. Otherwise, take the first readable disk.
285+
*/
286+
static inline bool raid1_should_read_first(struct mddev *mddev,
287+
sector_t this_sector, int len)
288+
{
289+
if ((mddev->recovery_cp < this_sector + len))
290+
return true;
291+
292+
if (mddev_is_clustered(mddev) &&
293+
md_cluster_ops->area_resyncing(mddev, READ, this_sector,
294+
this_sector + len))
295+
return true;
296+
297+
return false;
298+
}

0 commit comments

Comments
 (0)