Skip to content

Commit 1ba1199

Browse files
LiBaokun96akpm00
authored andcommitted
writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
KASAN report null-ptr-deref: ================================================================== BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0 Write of size 8 at addr 0000000000000000 by task sync/943 CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461 Call Trace: <TASK> dump_stack_lvl+0x7f/0xc0 print_report+0x2ba/0x340 kasan_report+0xc4/0x120 kasan_check_range+0x1b7/0x2e0 __kasan_check_write+0x24/0x40 bdi_split_work_to_wbs+0x5c5/0x7b0 sync_inodes_sb+0x195/0x630 sync_inodes_one_sb+0x3a/0x50 iterate_supers+0x106/0x1b0 ksys_sync+0x98/0x160 [...] ================================================================== The race that causes the above issue is as follows: cpu1 cpu2 -------------------------|------------------------- inode_switch_wbs INIT_WORK(&isw->work, inode_switch_wbs_work_fn) queue_rcu_work(isw_wq, &isw->work) // queue_work async inode_switch_wbs_work_fn wb_put_many(old_wb, nr_switched) percpu_ref_put_many ref->data->release(ref) cgwb_release queue_work(cgwb_release_wq, &wb->release_work) // queue_work async &wb->release_work cgwb_release_workfn ksys_sync iterate_supers sync_inodes_one_sb sync_inodes_sb bdi_split_work_to_wbs kmalloc(sizeof(*work), GFP_ATOMIC) // alloc memory failed percpu_ref_exit ref->data = NULL kfree(data) wb_get(wb) percpu_ref_get(&wb->refcnt) percpu_ref_get_many(ref, 1) atomic_long_add(nr, &ref->data->count) atomic64_add(i, v) // trigger null-ptr-deref bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all wbs. If the allocation of new work fails, the on-stack fallback will be used and the reference count of the current wb is increased afterwards. If cgroup writeback membership switches occur before getting the reference count and the current wb is released as old_wd, then calling wb_get() or wb_put() will trigger the null pointer dereference above. This issue was introduced in v4.3-rc7 (see fix tag1). Both sync_inodes_sb() and __writeback_inodes_sb_nr() calls to bdi_split_work_to_wbs() can trigger this issue. For scenarios called via sync_inodes_sb(), originally commit 7fc5854 ("writeback: synchronize sync(2) against cgroup writeback membership switches") reduced the possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io, thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(), and the issue becomes easily reproducible again. To solve this problem, percpu_ref_exit() is called under RCU protection to avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs(). Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(), and skip the current wb if wb_tryget() fails because the wb has already been shutdown. Link: https://lkml.kernel.org/r/[email protected] Fixes: b817525 ("writeback: bdi_writeback iteration must not skip dying ones") Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Hou Tao <[email protected]> Cc: yangerkun <[email protected]> Cc: Zhang Yi <[email protected]> Cc: Jens Axboe <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 1f5f12e commit 1ba1199

File tree

2 files changed

+20
-9
lines changed

2 files changed

+20
-9
lines changed

fs/fs-writeback.c

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
978978
continue;
979979
}
980980

981+
/*
982+
* If wb_tryget fails, the wb has been shutdown, skip it.
983+
*
984+
* Pin @wb so that it stays on @bdi->wb_list. This allows
985+
* continuing iteration from @wb after dropping and
986+
* regrabbing rcu read lock.
987+
*/
988+
if (!wb_tryget(wb))
989+
continue;
990+
981991
/* alloc failed, execute synchronously using on-stack fallback */
982992
work = &fallback_work;
983993
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
986996
work->done = &fallback_work_done;
987997

988998
wb_queue_work(wb, work);
989-
990-
/*
991-
* Pin @wb so that it stays on @bdi->wb_list. This allows
992-
* continuing iteration from @wb after dropping and
993-
* regrabbing rcu read lock.
994-
*/
995-
wb_get(wb);
996999
last_wb = wb;
9971000

9981001
rcu_read_unlock();

mm/backing-dev.c

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
507507
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
508508
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
509509

510+
static void cgwb_free_rcu(struct rcu_head *rcu_head)
511+
{
512+
struct bdi_writeback *wb = container_of(rcu_head,
513+
struct bdi_writeback, rcu);
514+
515+
percpu_ref_exit(&wb->refcnt);
516+
kfree(wb);
517+
}
518+
510519
static void cgwb_release_workfn(struct work_struct *work)
511520
{
512521
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
529538
list_del(&wb->offline_node);
530539
spin_unlock_irq(&cgwb_lock);
531540

532-
percpu_ref_exit(&wb->refcnt);
533541
wb_exit(wb);
534542
bdi_put(bdi);
535543
WARN_ON_ONCE(!list_empty(&wb->b_attached));
536-
kfree_rcu(wb, rcu);
544+
call_rcu(&wb->rcu, cgwb_free_rcu);
537545
}
538546

539547
static void cgwb_release(struct percpu_ref *refcnt)

0 commit comments

Comments
 (0)