Skip to content

Commit 0a32e4f

Browse files
fdmananakdave
authored andcommitted
btrfs: fix log tree replay failure due to file with 0 links and extents
If we log a new inode (not persisted in a past transaction) that has 0 links and extents, then log another inode with an higher inode number, we end up with failing to replay the log tree with -EINVAL. The steps for this are: 1) create new file A 2) write some data to file A 3) open an fd on file A 4) unlink file A 5) fsync file A using the previously open fd 6) create file B (has higher inode number than file A) 7) fsync file B 8) power fail before current transaction commits Now when attempting to mount the fs, the log replay will fail with -ENOENT at replay_one_extent() when attempting to replay the first extent of file A. The failure comes when trying to open the inode for file A in the subvolume tree, since it doesn't exist. Before commit 5f61b96 ("btrfs: fix inode lookup error handling during log replay"), the returned error was -EIO instead of -ENOENT, since we converted any errors when attempting to read an inode during log replay to -EIO. The reason for this is that the log replay procedure fails to ignore the current inode when we are at the stage LOG_WALK_REPLAY_ALL, our current inode has 0 links and last inode we processed in the previous stage has a non 0 link count. In other words, the issue is that at replay_one_extent() we only update wc->ignore_cur_inode if the current replay stage is LOG_WALK_REPLAY_INODES. Fix this by updating wc->ignore_cur_inode whenever we find an inode item regardless of the current replay stage. This is a simple solution and easy to backport, but later we can do other alternatives like avoid logging extents or inode items other than the inode item for inodes with a link count of 0. The problem with the wc->ignore_cur_inode logic has been around since commit f2d72f4 ("Btrfs: fix warning when replaying log after fsync of a tmpfile") but it only became frequent to hit since the more recent commit 5e85262 ("btrfs: fix fsync of files with no hard links not persisting deletion"), because we stopped skipping inodes with a link count of 0 when logging, while before the problem would only be triggered if trying to replay a log tree created with an older kernel which has a logged inode with 0 links. A test case for fstests will be submitted soon. Reported-by: Peter Jung <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reported-by: burneddi <[email protected]> Link: https://lore.kernel.org/linux-btrfs/lh4W-Lwc0Mbk-QvBhhQyZxf6VbM3E8VtIvU3fPIQgweP_Q1n7wtlUZQc33sYlCKYd-o6rryJQfhHaNAOWWRKxpAXhM8NZPojzsJPyHMf2qY=@protonmail.com/#t Reported-by: Russell Haley <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Fixes: f2d72f4 ("Btrfs: fix warning when replaying log after fsync of a tmpfile") CC: [email protected] # 5.4+ Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 005b0a0 commit 0a32e4f

File tree

1 file changed

+30
-18
lines changed

1 file changed

+30
-18
lines changed

fs/btrfs/tree-log.c

Lines changed: 30 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -321,8 +321,7 @@ struct walk_control {
321321

322322
/*
323323
* Ignore any items from the inode currently being processed. Needs
324-
* to be set every time we find a BTRFS_INODE_ITEM_KEY and we are in
325-
* the LOG_WALK_REPLAY_INODES stage.
324+
* to be set every time we find a BTRFS_INODE_ITEM_KEY.
326325
*/
327326
bool ignore_cur_inode;
328327

@@ -2465,32 +2464,45 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
24652464

24662465
nritems = btrfs_header_nritems(eb);
24672466
for (i = 0; i < nritems; i++) {
2468-
btrfs_item_key_to_cpu(eb, &key, i);
2467+
struct btrfs_inode_item *inode_item;
24692468

2470-
/* inode keys are done during the first stage */
2471-
if (key.type == BTRFS_INODE_ITEM_KEY &&
2472-
wc->stage == LOG_WALK_REPLAY_INODES) {
2473-
struct btrfs_inode_item *inode_item;
2474-
u32 mode;
2469+
btrfs_item_key_to_cpu(eb, &key, i);
24752470

2476-
inode_item = btrfs_item_ptr(eb, i,
2477-
struct btrfs_inode_item);
2471+
if (key.type == BTRFS_INODE_ITEM_KEY) {
2472+
inode_item = btrfs_item_ptr(eb, i, struct btrfs_inode_item);
24782473
/*
2479-
* If we have a tmpfile (O_TMPFILE) that got fsync'ed
2480-
* and never got linked before the fsync, skip it, as
2481-
* replaying it is pointless since it would be deleted
2482-
* later. We skip logging tmpfiles, but it's always
2483-
* possible we are replaying a log created with a kernel
2484-
* that used to log tmpfiles.
2474+
* An inode with no links is either:
2475+
*
2476+
* 1) A tmpfile (O_TMPFILE) that got fsync'ed and never
2477+
* got linked before the fsync, skip it, as replaying
2478+
* it is pointless since it would be deleted later.
2479+
* We skip logging tmpfiles, but it's always possible
2480+
* we are replaying a log created with a kernel that
2481+
* used to log tmpfiles;
2482+
*
2483+
* 2) A non-tmpfile which got its last link deleted
2484+
* while holding an open fd on it and later got
2485+
* fsynced through that fd. We always log the
2486+
* parent inodes when inode->last_unlink_trans is
2487+
* set to the current transaction, so ignore all the
2488+
* inode items for this inode. We will delete the
2489+
* inode when processing the parent directory with
2490+
* replay_dir_deletes().
24852491
*/
24862492
if (btrfs_inode_nlink(eb, inode_item) == 0) {
24872493
wc->ignore_cur_inode = true;
24882494
continue;
24892495
} else {
24902496
wc->ignore_cur_inode = false;
24912497
}
2492-
ret = replay_xattr_deletes(wc->trans, root, log,
2493-
path, key.objectid);
2498+
}
2499+
2500+
/* Inode keys are done during the first stage. */
2501+
if (key.type == BTRFS_INODE_ITEM_KEY &&
2502+
wc->stage == LOG_WALK_REPLAY_INODES) {
2503+
u32 mode;
2504+
2505+
ret = replay_xattr_deletes(wc->trans, root, log, path, key.objectid);
24942506
if (ret)
24952507
break;
24962508
mode = btrfs_inode_mode(eb, inode_item);

0 commit comments

Comments
 (0)