Skip to content

Commit 97ae272

Browse files
Gal Ofriliu-song-6
authored andcommitted
md/raid5: avoid device_lock in read_one_chunk()
There is a lock contention on device_lock in read_one_chunk(). device_lock is taken to sync conf->active_aligned_reads and conf->quiesce. read_one_chunk() takes the lock, then waits for quiesce=0 (resumed) before incrementing active_aligned_reads. raid5_quiesce() takes the lock, sets quiesce=2 (in-progress), then waits for active_aligned_reads to be zero before setting quiesce=1 (suspended). Introduce a fast (lockless) path in read_one_chunk(): activate aligned read without taking device_lock. In case quiesce starts while activating the aligned-read in fast path, deactivate it and revert to old behavior (take device_lock and wait for quiesce to finish). Add smp store/load in raid5_quiesce()/read_one_chunk() respectively to gaurantee that read_one_chunk() does not miss an ongoing quiesce. My setups: 1. 8 local nvme drives (each up to 250k iops). 2. 8 ram disks (brd). Each setup with raid6 (6+2), 1024 io threads on a 96 cpu-cores (48 per socket) system. Record both iops and cpu spent on this contention with rand-read-4k. Record bw with sequential-read-128k. Note: in most cases cpu is still busy but due to "new" bottlenecks. nvme: | iops | cpu | bw ----------------------------------------------- without patch | 1.6M | ~50% | 5.5GB/s with patch | 2M (throttled) | 0% | 16GB/s (throttled) ram (brd): | iops | cpu | bw ----------------------------------------------- without patch | 2M | ~80% | 24GB/s with patch | 4M | 0% | 55GB/s CC: Song Liu <[email protected]> CC: Neil Brown <[email protected]> Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Gal Ofri <[email protected]> Signed-off-by: Song Liu <[email protected]>
1 parent de3ea66 commit 97ae272

File tree

1 file changed

+23
-6
lines changed

1 file changed

+23
-6
lines changed

drivers/md/raid5.c

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5403,6 +5403,7 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
54035403
sector_t sector, end_sector, first_bad;
54045404
int bad_sectors, dd_idx;
54055405
struct md_io_acct *md_io_acct;
5406+
bool did_inc;
54065407

54075408
if (!in_chunk_boundary(mddev, raid_bio)) {
54085409
pr_debug("%s: non aligned\n", __func__);
@@ -5454,11 +5455,24 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
54545455
/* No reshape active, so we can trust rdev->data_offset */
54555456
align_bio->bi_iter.bi_sector += rdev->data_offset;
54565457

5457-
spin_lock_irq(&conf->device_lock);
5458-
wait_event_lock_irq(conf->wait_for_quiescent, conf->quiesce == 0,
5459-
conf->device_lock);
5460-
atomic_inc(&conf->active_aligned_reads);
5461-
spin_unlock_irq(&conf->device_lock);
5458+
did_inc = false;
5459+
if (conf->quiesce == 0) {
5460+
atomic_inc(&conf->active_aligned_reads);
5461+
did_inc = true;
5462+
}
5463+
/* need a memory barrier to detect the race with raid5_quiesce() */
5464+
if (!did_inc || smp_load_acquire(&conf->quiesce) != 0) {
5465+
/* quiesce is in progress, so we need to undo io activation and wait
5466+
* for it to finish
5467+
*/
5468+
if (did_inc && atomic_dec_and_test(&conf->active_aligned_reads))
5469+
wake_up(&conf->wait_for_quiescent);
5470+
spin_lock_irq(&conf->device_lock);
5471+
wait_event_lock_irq(conf->wait_for_quiescent, conf->quiesce == 0,
5472+
conf->device_lock);
5473+
atomic_inc(&conf->active_aligned_reads);
5474+
spin_unlock_irq(&conf->device_lock);
5475+
}
54625476

54635477
if (mddev->gendisk)
54645478
trace_block_bio_remap(align_bio, disk_devt(mddev->gendisk),
@@ -8346,7 +8360,10 @@ static void raid5_quiesce(struct mddev *mddev, int quiesce)
83468360
* active stripes can drain
83478361
*/
83488362
r5c_flush_cache(conf, INT_MAX);
8349-
conf->quiesce = 2;
8363+
/* need a memory barrier to make sure read_one_chunk() sees
8364+
* quiesce started and reverts to slow (locked) path.
8365+
*/
8366+
smp_store_release(&conf->quiesce, 2);
83508367
wait_event_cmd(conf->wait_for_quiescent,
83518368
atomic_read(&conf->active_stripes) == 0 &&
83528369
atomic_read(&conf->active_aligned_reads) == 0,

0 commit comments

Comments
 (0)