Skip to content

Commit f4a9f21

Browse files
fdmananakdave
authored andcommitted
btrfs: do not delete unused block group if it may be used soon
Before deleting a block group that is in the list of unused block groups (fs_info->unused_bgs), we check if the block group became used before deleting it, as extents from it may have been allocated after it was added to the list. However even if the block group was not yet used, there may be tasks that have only reserved space and have not yet allocated extents, and they might be relying on the availability of the unused block group in order to allocate extents. The reservation works first by increasing the "bytes_may_use" field of the corresponding space_info object (which may first require flushing delayed items, allocating a new block group, etc), and only later a task does the actual allocation of extents. For metadata we usually don't end up using all reserved space, as we are pessimistic and typically account for the worst cases (need to COW every single node in a path of a tree at maximum possible height, etc). For data we usually reserve the exact amount of space we're going to allocate later, except when using compression where we always reserve space based on the uncompressed size, as compression is only triggered when writeback starts so we don't know in advance how much space we'll actually need, or if the data is compressible. So don't delete an unused block group if the total size of its space_info object minus the block group's size is less then the sum of used space and space that may be used (space_info->bytes_may_use), as that means we have tasks that reserved space and may need to allocate extents from the block group. In this case, besides skipping the deletion, re-add the block group to the list of unused block groups so that it may be reconsidered later, in case the tasks that reserved space end up not needing to allocate extents from it. Allowing the deletion of the block group while we have reserved space, can result in tasks failing to allocate metadata extents (-ENOSPC) while under a transaction handle, resulting in a transaction abort, or failure during writeback for the case of data extents. CC: [email protected] # 6.0+ Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 1693d54 commit f4a9f21

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

fs/btrfs/block-group.c

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1455,6 +1455,7 @@ static bool clean_pinned_extents(struct btrfs_trans_handle *trans,
14551455
*/
14561456
void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
14571457
{
1458+
LIST_HEAD(retry_list);
14581459
struct btrfs_block_group *block_group;
14591460
struct btrfs_space_info *space_info;
14601461
struct btrfs_trans_handle *trans;
@@ -1476,6 +1477,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
14761477

14771478
spin_lock(&fs_info->unused_bgs_lock);
14781479
while (!list_empty(&fs_info->unused_bgs)) {
1480+
u64 used;
14791481
int trimming;
14801482

14811483
block_group = list_first_entry(&fs_info->unused_bgs,
@@ -1511,6 +1513,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
15111513
goto next;
15121514
}
15131515

1516+
spin_lock(&space_info->lock);
15141517
spin_lock(&block_group->lock);
15151518
if (btrfs_is_block_group_used(block_group) || block_group->ro ||
15161519
list_is_singular(&block_group->list)) {
@@ -1522,10 +1525,49 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
15221525
*/
15231526
trace_btrfs_skip_unused_block_group(block_group);
15241527
spin_unlock(&block_group->lock);
1528+
spin_unlock(&space_info->lock);
1529+
up_write(&space_info->groups_sem);
1530+
goto next;
1531+
}
1532+
1533+
/*
1534+
* The block group may be unused but there may be space reserved
1535+
* accounting with the existence of that block group, that is,
1536+
* space_info->bytes_may_use was incremented by a task but no
1537+
* space was yet allocated from the block group by the task.
1538+
* That space may or may not be allocated, as we are generally
1539+
* pessimistic about space reservation for metadata as well as
1540+
* for data when using compression (as we reserve space based on
1541+
* the worst case, when data can't be compressed, and before
1542+
* actually attempting compression, before starting writeback).
1543+
*
1544+
* So check if the total space of the space_info minus the size
1545+
* of this block group is less than the used space of the
1546+
* space_info - if that's the case, then it means we have tasks
1547+
* that might be relying on the block group in order to allocate
1548+
* extents, and add back the block group to the unused list when
1549+
* we finish, so that we retry later in case no tasks ended up
1550+
* needing to allocate extents from the block group.
1551+
*/
1552+
used = btrfs_space_info_used(space_info, true);
1553+
if (space_info->total_bytes - block_group->length < used) {
1554+
/*
1555+
* Add a reference for the list, compensate for the ref
1556+
* drop under the "next" label for the
1557+
* fs_info->unused_bgs list.
1558+
*/
1559+
btrfs_get_block_group(block_group);
1560+
list_add_tail(&block_group->bg_list, &retry_list);
1561+
1562+
trace_btrfs_skip_unused_block_group(block_group);
1563+
spin_unlock(&block_group->lock);
1564+
spin_unlock(&space_info->lock);
15251565
up_write(&space_info->groups_sem);
15261566
goto next;
15271567
}
1568+
15281569
spin_unlock(&block_group->lock);
1570+
spin_unlock(&space_info->lock);
15291571

15301572
/* We don't want to force the issue, only flip if it's ok. */
15311573
ret = inc_block_group_ro(block_group, 0);
@@ -1649,12 +1691,16 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
16491691
btrfs_put_block_group(block_group);
16501692
spin_lock(&fs_info->unused_bgs_lock);
16511693
}
1694+
list_splice_tail(&retry_list, &fs_info->unused_bgs);
16521695
spin_unlock(&fs_info->unused_bgs_lock);
16531696
mutex_unlock(&fs_info->reclaim_bgs_lock);
16541697
return;
16551698

16561699
flip_async:
16571700
btrfs_end_transaction(trans);
1701+
spin_lock(&fs_info->unused_bgs_lock);
1702+
list_splice_tail(&retry_list, &fs_info->unused_bgs);
1703+
spin_unlock(&fs_info->unused_bgs_lock);
16581704
mutex_unlock(&fs_info->reclaim_bgs_lock);
16591705
btrfs_put_block_group(block_group);
16601706
btrfs_discard_punt_unused_bgs_list(fs_info);

0 commit comments

Comments
 (0)