Skip to content

Commit 9ad6d91

Browse files
fdmananakdave
authored andcommitted
btrfs: fix log replay failure due to race with space cache rebuild
After a sudden power failure we may end up with a space cache on disk that is not valid and needs to be rebuilt from scratch. If that happens, during log replay when we attempt to pin an extent buffer from a log tree, at btrfs_pin_extent_for_log_replay(), we do not wait for the space cache to be rebuilt through the call to: btrfs_cache_block_group(cache, 1); That is because that only waits for the task (work queue job) that loads the space cache to change the cache state from BTRFS_CACHE_FAST to any other value. That is ok when the space cache on disk exists and is valid, but when the cache is not valid and needs to be rebuilt, it ends up returning as soon as the cache state changes to BTRFS_CACHE_STARTED (done at caching_thread()). So this means that we can end up trying to unpin a range which is not yet marked as free in the block group. This results in the call to btrfs_remove_free_space() to return -EINVAL to btrfs_pin_extent_for_log_replay(), which in turn makes the log replay fail as well as mounting the filesystem. More specifically the -EINVAL comes from free_space_cache.c:remove_from_bitmap(), because the requested range is not marked as free space (ones in the bitmap), we have the following condition triggered: static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl, (...) if (ret < 0 || search_start != *offset) return -EINVAL; (...) It's the "search_start != *offset" that results in the condition being evaluated to true. When this happens we got the following in dmesg/syslog: [72383.415114] BTRFS: device fsid 32b95b69-0ea9-496a-9f02-3f5a56dc9322 devid 1 transid 1432 /dev/sdb scanned by mount (3816007) [72383.417837] BTRFS info (device sdb): disk space caching is enabled [72383.418536] BTRFS info (device sdb): has skinny extents [72383.423846] BTRFS info (device sdb): start tree-log replay [72383.426416] BTRFS warning (device sdb): block group 30408704 has wrong amount of free space [72383.427686] BTRFS warning (device sdb): failed to load free space cache for block group 30408704, rebuilding it now [72383.454291] BTRFS: error (device sdb) in btrfs_recover_log_trees:6203: errno=-22 unknown (Failed to pin buffers while recovering log root tree.) [72383.456725] BTRFS: error (device sdb) in btrfs_replay_log:2253: errno=-22 unknown (Failed to recover log tree) [72383.460241] BTRFS error (device sdb): open_ctree failed We also mark the range for the extent buffer in the excluded extents io tree. That is fine when the space cache is valid on disk and we can load it, in which case it causes no problems. However, for the case where we need to rebuild the space cache, because it is either invalid or it is missing, having the extent buffer range marked in the excluded extents io tree leads to a -EINVAL failure from the call to btrfs_remove_free_space(), resulting in the log replay and mount to fail. This is because by having the range marked in the excluded extents io tree, the caching thread ends up never adding the range of the extent buffer as free space in the block group since the calls to add_new_free_space(), called from load_extent_tree_free(), filter out any ranges that are marked as excluded extents. So fix this by making sure that during log replay we wait for the caching task to finish completely when we need to rebuild a space cache, and also drop the need to mark the extent buffer range in the excluded extents io tree, as well as clearing ranges from that tree at btrfs_finish_extent_commit(). This started to happen with some frequency on large filesystems having block groups with a lot of fragmentation since the recent commit e747853 ("btrfs: load free space cache asynchronously"), but in fact the issue has been there for years, it was just much less likely to happen. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent c41ec45 commit 9ad6d91

File tree

1 file changed

+18
-43
lines changed

1 file changed

+18
-43
lines changed

fs/btrfs/extent-tree.c

Lines changed: 18 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -2602,8 +2602,6 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_trans_handle *trans,
26022602
struct btrfs_block_group *cache;
26032603
int ret;
26042604

2605-
btrfs_add_excluded_extent(trans->fs_info, bytenr, num_bytes);
2606-
26072605
cache = btrfs_lookup_block_group(trans->fs_info, bytenr);
26082606
if (!cache)
26092607
return -EINVAL;
@@ -2615,11 +2613,19 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_trans_handle *trans,
26152613
* the pinned extents.
26162614
*/
26172615
btrfs_cache_block_group(cache, 1);
2616+
/*
2617+
* Make sure we wait until the cache is completely built in case it is
2618+
* missing or is invalid and therefore needs to be rebuilt.
2619+
*/
2620+
ret = btrfs_wait_block_group_cache_done(cache);
2621+
if (ret)
2622+
goto out;
26182623

26192624
pin_down_extent(trans, cache, bytenr, num_bytes, 0);
26202625

26212626
/* remove us from the free space cache (if we're there at all) */
26222627
ret = btrfs_remove_free_space(cache, bytenr, num_bytes);
2628+
out:
26232629
btrfs_put_block_group(cache);
26242630
return ret;
26252631
}
@@ -2629,50 +2635,22 @@ static int __exclude_logged_extent(struct btrfs_fs_info *fs_info,
26292635
{
26302636
int ret;
26312637
struct btrfs_block_group *block_group;
2632-
struct btrfs_caching_control *caching_ctl;
26332638

26342639
block_group = btrfs_lookup_block_group(fs_info, start);
26352640
if (!block_group)
26362641
return -EINVAL;
26372642

2638-
btrfs_cache_block_group(block_group, 0);
2639-
caching_ctl = btrfs_get_caching_control(block_group);
2640-
2641-
if (!caching_ctl) {
2642-
/* Logic error */
2643-
BUG_ON(!btrfs_block_group_done(block_group));
2644-
ret = btrfs_remove_free_space(block_group, start, num_bytes);
2645-
} else {
2646-
/*
2647-
* We must wait for v1 caching to finish, otherwise we may not
2648-
* remove our space.
2649-
*/
2650-
btrfs_wait_space_cache_v1_finished(block_group, caching_ctl);
2651-
mutex_lock(&caching_ctl->mutex);
2652-
2653-
if (start >= caching_ctl->progress) {
2654-
ret = btrfs_add_excluded_extent(fs_info, start,
2655-
num_bytes);
2656-
} else if (start + num_bytes <= caching_ctl->progress) {
2657-
ret = btrfs_remove_free_space(block_group,
2658-
start, num_bytes);
2659-
} else {
2660-
num_bytes = caching_ctl->progress - start;
2661-
ret = btrfs_remove_free_space(block_group,
2662-
start, num_bytes);
2663-
if (ret)
2664-
goto out_lock;
2643+
btrfs_cache_block_group(block_group, 1);
2644+
/*
2645+
* Make sure we wait until the cache is completely built in case it is
2646+
* missing or is invalid and therefore needs to be rebuilt.
2647+
*/
2648+
ret = btrfs_wait_block_group_cache_done(block_group);
2649+
if (ret)
2650+
goto out;
26652651

2666-
num_bytes = (start + num_bytes) -
2667-
caching_ctl->progress;
2668-
start = caching_ctl->progress;
2669-
ret = btrfs_add_excluded_extent(fs_info, start,
2670-
num_bytes);
2671-
}
2672-
out_lock:
2673-
mutex_unlock(&caching_ctl->mutex);
2674-
btrfs_put_caching_control(caching_ctl);
2675-
}
2652+
ret = btrfs_remove_free_space(block_group, start, num_bytes);
2653+
out:
26762654
btrfs_put_block_group(block_group);
26772655
return ret;
26782656
}
@@ -2863,9 +2841,6 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
28632841
mutex_unlock(&fs_info->unused_bg_unpin_mutex);
28642842
break;
28652843
}
2866-
if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
2867-
clear_extent_bits(&fs_info->excluded_extents, start,
2868-
end, EXTENT_UPTODATE);
28692844

28702845
if (btrfs_test_opt(fs_info, DISCARD_SYNC))
28712846
ret = btrfs_discard_extent(fs_info, start,

0 commit comments

Comments
 (0)