Skip to content

Commit b19c98f

Browse files
josefbacikkdave
authored andcommitted
btrfs: fix race between balance and cancel/pause
Syzbot reported a panic that looks like this: assertion failed: fs_info->exclusive_operation == BTRFS_EXCLOP_BALANCE_PAUSED, in fs/btrfs/ioctl.c:465 ------------[ cut here ]------------ kernel BUG at fs/btrfs/messages.c:259! RIP: 0010:btrfs_assertfail+0x2c/0x30 fs/btrfs/messages.c:259 Call Trace: <TASK> btrfs_exclop_balance fs/btrfs/ioctl.c:465 [inline] btrfs_ioctl_balance fs/btrfs/ioctl.c:3564 [inline] btrfs_ioctl+0x531e/0x5b30 fs/btrfs/ioctl.c:4632 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The reproducer is running a balance and a cancel or pause in parallel. The way balance finishes is a bit wonky, if we were paused we need to save the balance_ctl in the fs_info, but clear it otherwise and cleanup. However we rely on the return values being specific errors, or having a cancel request or no pause request. If balance completes and returns 0, but we have a pause or cancel request we won't do the appropriate cleanup, and then the next time we try to start a balance we'll trip this ASSERT. The error handling is just wrong here, we always want to clean up, unless we got -ECANCELLED and we set the appropriate pause flag in the exclusive op. With this patch the reproducer ran for an hour without tripping, previously it would trip in less than a few minutes. Reported-by: [email protected] CC: [email protected] # 6.1+ Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 8a4a0b2 commit b19c98f

File tree

1 file changed

+4
-10
lines changed

1 file changed

+4
-10
lines changed

fs/btrfs/volumes.c

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4081,14 +4081,6 @@ static int alloc_profile_is_valid(u64 flags, int extended)
40814081
return has_single_bit_set(flags);
40824082
}
40834083

4084-
static inline int balance_need_close(struct btrfs_fs_info *fs_info)
4085-
{
4086-
/* cancel requested || normal exit path */
4087-
return atomic_read(&fs_info->balance_cancel_req) ||
4088-
(atomic_read(&fs_info->balance_pause_req) == 0 &&
4089-
atomic_read(&fs_info->balance_cancel_req) == 0);
4090-
}
4091-
40924084
/*
40934085
* Validate target profile against allowed profiles and return true if it's OK.
40944086
* Otherwise print the error message and return false.
@@ -4278,6 +4270,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
42784270
u64 num_devices;
42794271
unsigned seq;
42804272
bool reducing_redundancy;
4273+
bool paused = false;
42814274
int i;
42824275

42834276
if (btrfs_fs_closing(fs_info) ||
@@ -4408,6 +4401,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
44084401
if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
44094402
btrfs_info(fs_info, "balance: paused");
44104403
btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
4404+
paused = true;
44114405
}
44124406
/*
44134407
* Balance can be canceled by:
@@ -4436,8 +4430,8 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
44364430
btrfs_update_ioctl_balance_args(fs_info, bargs);
44374431
}
44384432

4439-
if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
4440-
balance_need_close(fs_info)) {
4433+
/* We didn't pause, we can clean everything up. */
4434+
if (!paused) {
44414435
reset_balance_state(fs_info);
44424436
btrfs_exclop_finish(fs_info);
44434437
}

0 commit comments

Comments
 (0)