Skip to content

Commit ea9c846

Browse files
fdmananagregkh
authored andcommitted
Btrfs: fix deadlock when writing out free space caches
commit 5ce5555 upstream. When writing out a block group free space cache we can end deadlocking with ourselves on an extent buffer lock resulting in a warning like the following: [245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251 btrfs_tree_lock+0x1be/0x1d0 [btrfs] [245043.392792] CPU: 4 PID: 2608 Comm: btrfs-transacti Tainted: G W I 4.16.8 #1 [245043.395489] RIP: 0010:btrfs_tree_lock+0x1be/0x1d0 [btrfs] [245043.396791] RSP: 0018:ffffc9000424b840 EFLAGS: 00010246 [245043.398093] RAX: 0000000000000a30 RBX: ffff8807e20a3d20 RCX: 0000000000000001 [245043.399414] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8807e20a3d20 [245043.400732] RBP: 0000000000000001 R08: ffff88041f39a700 R09: ffff880000000000 [245043.402021] R10: 0000000000000040 R11: ffff8807e20a3d20 R12: ffff8807cb220630 [245043.403296] R13: 0000000000000001 R14: ffff8807cb220628 R15: ffff88041fbdf000 [245043.404780] FS: 0000000000000000(0000) GS:ffff88082fc80000(0000) knlGS:0000000000000000 [245043.406050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [245043.407321] CR2: 00007fffdbdb9f10 CR3: 0000000001c09005 CR4: 00000000000206e0 [245043.408670] Call Trace: [245043.409977] btrfs_search_slot+0x761/0xa60 [btrfs] [245043.411278] btrfs_insert_empty_items+0x62/0xb0 [btrfs] [245043.412572] btrfs_insert_item+0x5b/0xc0 [btrfs] [245043.413922] btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs] [245043.415216] do_chunk_alloc+0x1e5/0x2a0 [btrfs] [245043.416487] find_free_extent+0xcd0/0xf60 [btrfs] [245043.417813] btrfs_reserve_extent+0x96/0x1e0 [btrfs] [245043.419105] btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs] [245043.420378] __btrfs_cow_block+0x127/0x550 [btrfs] [245043.421652] btrfs_cow_block+0xee/0x190 [btrfs] [245043.422979] btrfs_search_slot+0x227/0xa60 [btrfs] [245043.424279] ? btrfs_update_inode_item+0x59/0x100 [btrfs] [245043.425538] ? iput+0x72/0x1e0 [245043.426798] write_one_cache_group.isra.49+0x20/0x90 [btrfs] [245043.428131] btrfs_start_dirty_block_groups+0x102/0x420 [btrfs] [245043.429419] btrfs_commit_transaction+0x11b/0x880 [btrfs] [245043.430712] ? start_transaction+0x8e/0x410 [btrfs] [245043.432006] transaction_kthread+0x184/0x1a0 [btrfs] [245043.433341] kthread+0xf0/0x130 [245043.434628] ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs] [245043.435928] ? kthread_create_worker_on_cpu+0x40/0x40 [245043.437236] ret_from_fork+0x1f/0x30 [245043.441054] ---[ end trace 15abaa2aaf36827f ]--- This is because at write_one_cache_group() when we are COWing a leaf from the extent tree we end up allocating a new block group (chunk) and, because we have hit a threshold on the number of bytes reserved for system chunks, we attempt to finalize the creation of new block groups from the current transaction, by calling btrfs_create_pending_block_groups(). However here we also need to modify the extent tree in order to insert a block group item, and if the location for this new block group item happens to be in the same leaf that we were COWing earlier, we deadlock since btrfs_search_slot() tries to write lock the extent buffer that we locked before at write_one_cache_group(). We have already hit similar cases in the past and commit d9a0540 ("Btrfs: fix deadlock when finalizing block group creation") fixed some of those cases by delaying the creation of pending block groups at the known specific spots that could lead to a deadlock. This change reworks that commit to be more generic so that we don't have to add similar logic to every possible path that can lead to a deadlock. This is done by making __btrfs_cow_block() disallowing the creation of new block groups (setting the transaction's can_flush_pending_bgs to false) before it attempts to allocate a new extent buffer for either the extent, chunk or device trees, since those are the trees that pending block creation modifies. Once the new extent buffer is allocated, it allows creation of pending block groups to happen again. This change depends on a recent patch from Josef which is not yet in Linus' tree, named "btrfs: make sure we create all new block groups" in order to avoid occasional warnings at btrfs_trans_release_chunk_metadata(). Fixes: d9a0540 ("Btrfs: fix deadlock when finalizing block group creation") CC: [email protected] # 4.4+ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199753 Link: https://lore.kernel.org/linux-btrfs/CAJtFHUTHna09ST-_EEiyWmDH6gAqS6wa=zMNMBsifj8ABu99cw@mail.gmail.com/ Reported-by: E V <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent e17af96 commit ea9c846

File tree

2 files changed

+23
-10
lines changed

2 files changed

+23
-10
lines changed

fs/btrfs/ctree.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1050,9 +1050,26 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
10501050
if ((root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && parent)
10511051
parent_start = parent->start;
10521052

1053+
/*
1054+
* If we are COWing a node/leaf from the extent, chunk or device trees,
1055+
* make sure that we do not finish block group creation of pending block
1056+
* groups. We do this to avoid a deadlock.
1057+
* COWing can result in allocation of a new chunk, and flushing pending
1058+
* block groups (btrfs_create_pending_block_groups()) can be triggered
1059+
* when finishing allocation of a new chunk. Creation of a pending block
1060+
* group modifies the extent, chunk and device trees, therefore we could
1061+
* deadlock with ourselves since we are holding a lock on an extent
1062+
* buffer that btrfs_create_pending_block_groups() may try to COW later.
1063+
*/
1064+
if (root == fs_info->extent_root ||
1065+
root == fs_info->chunk_root ||
1066+
root == fs_info->dev_root)
1067+
trans->can_flush_pending_bgs = false;
1068+
10531069
cow = btrfs_alloc_tree_block(trans, root, parent_start,
10541070
root->root_key.objectid, &disk_key, level,
10551071
search_start, empty_size);
1072+
trans->can_flush_pending_bgs = true;
10561073
if (IS_ERR(cow))
10571074
return PTR_ERR(cow);
10581075

fs/btrfs/extent-tree.c

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2911,7 +2911,6 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
29112911
struct btrfs_delayed_ref_head *head;
29122912
int ret;
29132913
int run_all = count == (unsigned long)-1;
2914-
bool can_flush_pending_bgs = trans->can_flush_pending_bgs;
29152914

29162915
/* We'll clean this up in btrfs_cleanup_transaction */
29172916
if (trans->aborted)
@@ -2928,7 +2927,6 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
29282927
#ifdef SCRAMBLE_DELAYED_REFS
29292928
delayed_refs->run_delayed_start = find_middle(&delayed_refs->root);
29302929
#endif
2931-
trans->can_flush_pending_bgs = false;
29322930
ret = __btrfs_run_delayed_refs(trans, count);
29332931
if (ret < 0) {
29342932
btrfs_abort_transaction(trans, ret);
@@ -2959,7 +2957,6 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
29592957
goto again;
29602958
}
29612959
out:
2962-
trans->can_flush_pending_bgs = can_flush_pending_bgs;
29632960
return 0;
29642961
}
29652962

@@ -4554,11 +4551,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
45544551
* the block groups that were made dirty during the lifetime of the
45554552
* transaction.
45564553
*/
4557-
if (trans->can_flush_pending_bgs &&
4558-
trans->chunk_bytes_reserved >= (u64)SZ_2M) {
4554+
if (trans->chunk_bytes_reserved >= (u64)SZ_2M)
45594555
btrfs_create_pending_block_groups(trans);
4560-
btrfs_trans_release_chunk_metadata(trans);
4561-
}
4556+
45624557
return ret;
45634558
}
45644559

@@ -10099,9 +10094,10 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
1009910094
struct btrfs_block_group_item item;
1010010095
struct btrfs_key key;
1010110096
int ret = 0;
10102-
bool can_flush_pending_bgs = trans->can_flush_pending_bgs;
1010310097

10104-
trans->can_flush_pending_bgs = false;
10098+
if (!trans->can_flush_pending_bgs)
10099+
return;
10100+
1010510101
while (!list_empty(&trans->new_bgs)) {
1010610102
block_group = list_first_entry(&trans->new_bgs,
1010710103
struct btrfs_block_group_cache,
@@ -10126,7 +10122,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
1012610122
next:
1012710123
list_del_init(&block_group->bg_list);
1012810124
}
10129-
trans->can_flush_pending_bgs = can_flush_pending_bgs;
10125+
btrfs_trans_release_chunk_metadata(trans);
1013010126
}
1013110127

1013210128
int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,

0 commit comments

Comments
 (0)