Skip to content

Commit 63c84b4

Browse files
fdmananakdave
authored andcommitted
btrfs: ignore fiemap path cache if we have multiple leaves for a data extent
The path cache used during fiemap used to determine the sharedness of extent buffers in a path from a leaf containing a file extent item pointing to our data extent up to the root node of the tree, is meant to be used for a single path. Having a single path is by far the most common case, and therefore worth to optimize for, but it's possible to actually have multiple paths because we have 2 or more leaves. If we have multiple leaves, the 'level' variable keeps getting incremented in each iteration of the while loop at btrfs_is_data_extent_shared(), which means we will treat the second leaf in the 'tmp' ulist as a level 1 node, and so forth. In the worst case this can lead to getting a level greater than or equals to BTRFS_MAX_LEVEL (8), which will trigger a WARN_ON_ONCE() in the functions to lookup from or store in the path cache (lookup_backref_shared_cache() and store_backref_shared_cache()). If the current level never goes beyond 8, due to shared nodes in the paths and a fs tree height smaller than 8, it can still result in incorrectly marking one leaf as shared because some other leaf is shared and is stored one level below that other leaf, as when storing a true sharedness value in the cache results in updating the sharedness to true of all entries in the cache below the current level. Having multiple leaves happens in a case like the following: - We have a file extent item point to data extent at bytenr X, for a file range [0, 1M[ for example; - At this moment we have an extent data ref for the extent, with an offset of 0 and a count of 1; - A write into the middle of the extent happens, file range [64K, 128K) so the file extent item is split into two (at btrfs_drop_extents()): 1) One for file range [0, 64K), with a length (num_bytes field) of 64K and an extent offset of 0; 2) Another one for file range [128K, 1M), with a length of 896K (1M - 128K) and an extent offset of 128K. - At this moment the two file extent items are located in the same leaf; - A new file extent item for the range [64K, 128K), pointing to a new data extent, is inserted in the leaf. This results in a leaf split and now those two file extent items pointing to data extent X end up located in different leaves; - Once delayed refs are run, we still have a single extent data ref item for our data extent at bytenr X, for offset 0, but now with a count of 2 instead of 1; - So during fiemap, at btrfs_is_data_extent_shared(), after we call find_parent_nodes() for the data extent, we get two leaves, since we have two file extent items point to data extent at bytenr X that are located in two different leaves. So skip the use of the path cache when we get more than one leaf. Fixes: 12a824d ("btrfs: speedup checking for extent sharedness during fiemap") Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 943553e commit 63c84b4

File tree

2 files changed

+26
-0
lines changed

2 files changed

+26
-0
lines changed

fs/btrfs/backref.c

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1536,6 +1536,9 @@ static bool lookup_backref_shared_cache(struct btrfs_backref_shared_cache *cache
15361536
{
15371537
struct btrfs_backref_shared_cache_entry *entry;
15381538

1539+
if (!cache->use_cache)
1540+
return false;
1541+
15391542
if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL))
15401543
return false;
15411544

@@ -1600,6 +1603,9 @@ static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache,
16001603
struct btrfs_backref_shared_cache_entry *entry;
16011604
u64 gen;
16021605

1606+
if (!cache->use_cache)
1607+
return;
1608+
16031609
if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL))
16041610
return;
16051611

@@ -1697,6 +1703,7 @@ int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
16971703
/* -1 means we are in the bytenr of the data extent. */
16981704
level = -1;
16991705
ULIST_ITER_INIT(&uiter);
1706+
cache->use_cache = true;
17001707
while (1) {
17011708
bool is_shared;
17021709
bool cached;
@@ -1726,6 +1733,24 @@ int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
17261733
extent_gen > btrfs_root_last_snapshot(&root->root_item))
17271734
break;
17281735

1736+
/*
1737+
* If our data extent was not directly shared (without multiple
1738+
* reference items), than it might have a single reference item
1739+
* with a count > 1 for the same offset, which means there are 2
1740+
* (or more) file extent items that point to the data extent -
1741+
* this happens when a file extent item needs to be split and
1742+
* then one item gets moved to another leaf due to a b+tree leaf
1743+
* split when inserting some item. In this case the file extent
1744+
* items may be located in different leaves and therefore some
1745+
* of the leaves may be referenced through shared subtrees while
1746+
* others are not. Since our extent buffer cache only works for
1747+
* a single path (by far the most common case and simpler to
1748+
* deal with), we can not use it if we have multiple leaves
1749+
* (which implies multiple paths).
1750+
*/
1751+
if (level == -1 && tmp->nnodes > 1)
1752+
cache->use_cache = false;
1753+
17291754
if (level >= 0)
17301755
store_backref_shared_cache(cache, root, bytenr,
17311756
level, false);

fs/btrfs/backref.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ struct btrfs_backref_shared_cache {
2929
* a given data extent should never exceed the maximum b+tree height.
3030
*/
3131
struct btrfs_backref_shared_cache_entry entries[BTRFS_MAX_LEVEL];
32+
bool use_cache;
3233
};
3334

3435
typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root,

0 commit comments

Comments
 (0)